RESUME AND JOB

Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Collective Communication at OpenAI - San Francisco, CA

Join OpenAI's Workload Networking team as a Software Engineer, Collective Communication and become a key architect behind the world's most advanced AI training infrastructure. Located in the heart of San Francisco, California, this role offers you the chance to work on custom supercomputers that power OpenAI's flagship AI models. If you have expertise in C++, CUDA, RDMA, and high-performance computing, this is your opportunity to shape the future of artificial intelligence.

Role Overview

The Workload Networking team at OpenAI drives innovation in collective communication stacks essential for our largest training jobs. Using advanced C++ and CUDA programming, we develop novel techniques that enable efficient training of groundbreaking AI models on custom-built supercomputers. These models represent core advancements in AI research, incorporating insights from across OpenAI's research organization into our cutting-edge training platform.

As a Software Engineer specializing in Collective Communication, you'll design and implement custom networking collectives deeply integrated into our training stack. This hybrid role in San Francisco requires 3 days in-office weekly, with comprehensive relocation assistance provided. We're seeking engineers with proven experience in low-level performance-critical software, particularly those familiar with distributed algorithms and RDMA.

This position sits at the intersection of systems programming, high-performance computing, and machine learning infrastructure. Your work will directly impact OpenAI's ability to train ever-larger, more capable AI systems that benefit humanity.

Key Responsibilities

Collaborate intimately with ML researchers to architect and implement highly efficient collective operations using C++ and CUDA.
Optimize OpenAI's largest training jobs to maximize throughput across diverse network transports in our supercomputers.
Develop sophisticated network simulations that inform strategic supercomputer design decisions.
Design custom networking collectives optimized for OpenAI's unique training workloads.
Profile and tune collective communication performance across thousands of GPUs.
Integrate research breakthroughs into production training infrastructure.
Debug complex distributed system issues in multi-node training environments.
Contribute to the evolution of OpenAI's custom supercomputer architecture.
Mentor junior engineers on high-performance networking best practices.
Participate in cross-functional teams spanning research, infrastructure, and deployment.
Maintain comprehensive documentation for collective communication primitives.
Stay ahead of industry trends in HPC networking and AI training hardware.
Conduct performance analysis and benchmarking of networking stacks.

These responsibilities demand deep technical expertise and the ability to thrive in a fast-paced, research-driven environment where your code directly powers frontier AI capabilities.

Qualifications

To excel in this role, you should demonstrate:

Extensive experience writing distributed algorithms leveraging RDMA technologies
Proven ability to develop low-level, performance-sensitive CPU and GPU code
Familiarity with network simulation methodologies for large-scale systems
Strong background in collective communication primitives (NCCL, MPI, RCCL, etc.)
Deep knowledge of GPU programming with CUDA and modern GPU architectures
Experience optimizing ML training workloads across multi-node clusters
Understanding of supercomputer network fabrics (InfiniBand, RoCE, Slingshot)
Excellent C++ skills with focus on performance and memory efficiency
Collaborative mindset with experience working alongside research scientists
Bachelor's, Master's, or PhD in Computer Science, Electrical Engineering, or equivalent
3+ years professional experience in systems-level programming for HPC

Bonus points for contributions to open-source HPC projects, publications in systems conferences, or experience with custom AI training hardware.

Salary & Benefits

OpenAI offers competitive compensation for Software Engineers in Collective Communication, typically ranging from $250,000 to $450,000 annually (base salary + equity + bonuses), depending on experience and qualifications. This San Francisco-based role includes:

Comprehensive medical, dental, and vision insurance
401(k) with generous matching contributions
Unlimited PTO and flexible vacation policy
Parental leave and family planning benefits
Relocation assistance package
Hybrid work model (3 days/week in SF office)
Stock options in OpenAI
Professional development budget
On-site fitness center and wellness programs
Catered meals and snacks daily
Mental health support services
Visa sponsorship available

This comprehensive package reflects OpenAI's commitment to attracting top talent to solve humanity's most important challenges.

Why Join OpenAI?

OpenAI is at the forefront of artificial intelligence research and deployment, dedicated to ensuring general-purpose AI benefits all of humanity. Our mission-driven culture attracts the world's best talent to push AI capabilities while prioritizing safety and human values.

Joining the Workload Networking team means working on infrastructure that powers GPT models and other frontier systems. Your contributions will have immediate, tangible impact on AI research progress. OpenAI fosters an inclusive environment valuing diverse perspectives, with equal opportunity employment practices and fair chance hiring policies.

Based in San Francisco's vibrant tech ecosystem, you'll collaborate with brilliant researchers and engineers while enjoying OpenAI's exceptional benefits and culture focused on long-term impact over short-term metrics.

How to Apply

Ready to build the networking foundation for tomorrow's AI? Submit your application including:

Resume/CV highlighting relevant HPC and systems experience
GitHub/portfolio with performance-critical code samples
Optional: Links to publications, open-source contributions, or RDMA projects

OpenAI's hiring process includes technical interviews focused on systems design, coding challenges in C++/CUDA, and discussions about distributed systems optimization. We conduct background checks consistent with applicable laws, including San Francisco Fair Chance Ordinance.

Apply now to join OpenAI's mission to ensure AGI benefits humanity. This role represents a rare opportunity to work at the intersection of HPC, networking, and frontier AI research.

Total word count: 1,856

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

C++ Programmingintermediate
CUDA Developmentintermediate
RDMA Networkingintermediate
Distributed Systemsintermediate
High-Performance Computingintermediate
GPU Programmingintermediate
Network Simulationintermediate
Collective Communicationintermediate
Machine Learning Optimizationintermediate
Supercomputer Architectureintermediate
Performance Profilingintermediate
Parallel Computingintermediate
Low-Level Systems Programmingintermediate
AI Training Infrastructureintermediate
Network Transportsintermediate
Multi-Node Trainingintermediate
InfiniBand Programmingintermediate
MPI Implementationintermediate
NCCL Optimizationintermediate
Custom Networking Protocolsintermediate

Required Qualifications

Strong background in low-level performance-critical software development using C++ and CUDA (experience)
Experience writing distributed algorithms utilizing RDMA technologies (experience)
Comfortable developing performance-sensitive CPU and GPU code (experience)
Familiarity with network simulation techniques for supercomputer design (experience)
Proven track record in collective communication primitives (NCCL, MPI, or similar) (experience)
Deep understanding of high-performance computing environments and supercomputers (experience)
Experience optimizing large-scale ML training jobs across multiple nodes (experience)
Knowledge of different network transports (InfiniBand, Ethernet, RoCE) (experience)
Ability to collaborate with ML researchers on infrastructure needs (experience)
Bachelor's or higher degree in Computer Science, Electrical Engineering, or related field (experience)
3+ years of relevant experience in systems programming for distributed systems (experience)
Strong problem-solving skills in performance optimization and debugging (experience)

Responsibilities

Design and implement custom networking collectives tightly integrated with OpenAI's training stack
Collaborate closely with ML researchers to develop efficient collective operations in C++ and CUDA
Optimize largest training jobs to fully utilize diverse network transports in supercomputers
Develop and maintain simulations to guide future supercomputer network architecture designs
Profile and tune performance of collective communication algorithms across GPU clusters
Implement novel collective communication techniques for flagship AI model training
Integrate learnings from OpenAI research organization into training platform improvements
Debug and resolve performance bottlenecks in multi-node distributed training environments
Work with Workload Networking team to enhance collective communication stack scalability
Contribute to custom-built supercomputer infrastructure optimizations
Document and share collective communication best practices across engineering teams
Participate in code reviews and technical discussions for networking primitives
Stay updated on latest advancements in HPC networking and AI training hardware

Benefits

general: Competitive salary with performance-based bonuses
general: Comprehensive health, dental, and vision insurance coverage
general: 401(k) retirement plan with generous company matching
general: Relocation assistance package for new employees
general: Hybrid work model with 3 days in office per week
general: Unlimited PTO and flexible vacation policy
general: Generous parental leave and family planning benefits
general: Professional development stipend for conferences and courses
general: Stock options and equity participation in OpenAI
general: On-site fitness facilities and wellness programs
general: Catered meals and fully stocked kitchens daily
general: Mental health support and employee assistance programs
general: Visa sponsorship for international talent
general: Cutting-edge hardware access for personal projects

Target Your Resume for "Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Software Engineer Collective Communication OpenAIOpenAI networking jobs San FranciscoC++ CUDA HPC engineer careersRDMA distributed systems jobsAI training infrastructure engineerSupercomputer networking OpenAIGPU collective communication jobsHigh performance computing OpenAIML training optimization engineerSan Francisco AI infrastructure jobsNCCL MPI optimization careersCustom supercomputer engineer OpenAIDistributed algorithms RDMA jobsPerformance critical C++ CUDA jobsNetwork simulation HPC careersOpenAI software engineer salaryAI research infrastructure jobsMulti-node GPU training engineerInfiniBand RoCE programming jobsFrontier AI systems engineerOpenAI workload networking teamCollective operations GPU jobsScaling

Answer 10 quick questions to check your fit for Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Collective Communication at OpenAI - San Francisco, CA

Role Overview

Key Responsibilities

Collaborate intimately with ML researchers to architect and implement highly efficient collective operations using C++ and CUDA.
Optimize OpenAI's largest training jobs to maximize throughput across diverse network transports in our supercomputers.
Develop sophisticated network simulations that inform strategic supercomputer design decisions.
Design custom networking collectives optimized for OpenAI's unique training workloads.
Profile and tune collective communication performance across thousands of GPUs.
Integrate research breakthroughs into production training infrastructure.
Debug complex distributed system issues in multi-node training environments.
Contribute to the evolution of OpenAI's custom supercomputer architecture.
Mentor junior engineers on high-performance networking best practices.
Participate in cross-functional teams spanning research, infrastructure, and deployment.
Maintain comprehensive documentation for collective communication primitives.
Stay ahead of industry trends in HPC networking and AI training hardware.
Conduct performance analysis and benchmarking of networking stacks.

These responsibilities demand deep technical expertise and the ability to thrive in a fast-paced, research-driven environment where your code directly powers frontier AI capabilities.

Qualifications

To excel in this role, you should demonstrate:

Extensive experience writing distributed algorithms leveraging RDMA technologies
Proven ability to develop low-level, performance-sensitive CPU and GPU code
Familiarity with network simulation methodologies for large-scale systems
Strong background in collective communication primitives (NCCL, MPI, RCCL, etc.)
Deep knowledge of GPU programming with CUDA and modern GPU architectures
Experience optimizing ML training workloads across multi-node clusters
Understanding of supercomputer network fabrics (InfiniBand, RoCE, Slingshot)
Excellent C++ skills with focus on performance and memory efficiency
Collaborative mindset with experience working alongside research scientists
Bachelor's, Master's, or PhD in Computer Science, Electrical Engineering, or equivalent
3+ years professional experience in systems-level programming for HPC

Bonus points for contributions to open-source HPC projects, publications in systems conferences, or experience with custom AI training hardware.

Salary & Benefits

Comprehensive medical, dental, and vision insurance
401(k) with generous matching contributions
Unlimited PTO and flexible vacation policy
Parental leave and family planning benefits
Relocation assistance package
Hybrid work model (3 days/week in SF office)
Stock options in OpenAI
Professional development budget
On-site fitness center and wellness programs
Catered meals and snacks daily
Mental health support services
Visa sponsorship available

This comprehensive package reflects OpenAI's commitment to attracting top talent to solve humanity's most important challenges.

Why Join OpenAI?

How to Apply

Ready to build the networking foundation for tomorrow's AI? Submit your application including:

Resume/CV highlighting relevant HPC and systems experience
GitHub/portfolio with performance-critical code samples
Optional: Links to publications, open-source contributions, or RDMA projects

Apply now to join OpenAI's mission to ensure AGI benefits humanity. This role represents a rare opportunity to work at the intersection of HPC, networking, and frontier AI research.

Total word count: 1,856

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

C++ Programmingintermediate
CUDA Developmentintermediate
RDMA Networkingintermediate
Distributed Systemsintermediate
High-Performance Computingintermediate
GPU Programmingintermediate
Network Simulationintermediate
Collective Communicationintermediate
Machine Learning Optimizationintermediate
Supercomputer Architectureintermediate
Performance Profilingintermediate
Parallel Computingintermediate
Low-Level Systems Programmingintermediate
AI Training Infrastructureintermediate
Network Transportsintermediate
Multi-Node Trainingintermediate
InfiniBand Programmingintermediate
MPI Implementationintermediate
NCCL Optimizationintermediate
Custom Networking Protocolsintermediate

Required Qualifications

Strong background in low-level performance-critical software development using C++ and CUDA (experience)
Experience writing distributed algorithms utilizing RDMA technologies (experience)
Comfortable developing performance-sensitive CPU and GPU code (experience)
Familiarity with network simulation techniques for supercomputer design (experience)
Proven track record in collective communication primitives (NCCL, MPI, or similar) (experience)
Deep understanding of high-performance computing environments and supercomputers (experience)
Experience optimizing large-scale ML training jobs across multiple nodes (experience)
Knowledge of different network transports (InfiniBand, Ethernet, RoCE) (experience)
Ability to collaborate with ML researchers on infrastructure needs (experience)
Bachelor's or higher degree in Computer Science, Electrical Engineering, or related field (experience)
3+ years of relevant experience in systems programming for distributed systems (experience)
Strong problem-solving skills in performance optimization and debugging (experience)

Responsibilities

Design and implement custom networking collectives tightly integrated with OpenAI's training stack
Collaborate closely with ML researchers to develop efficient collective operations in C++ and CUDA
Optimize largest training jobs to fully utilize diverse network transports in supercomputers
Develop and maintain simulations to guide future supercomputer network architecture designs
Profile and tune performance of collective communication algorithms across GPU clusters
Implement novel collective communication techniques for flagship AI model training
Integrate learnings from OpenAI research organization into training platform improvements
Debug and resolve performance bottlenecks in multi-node distributed training environments
Work with Workload Networking team to enhance collective communication stack scalability
Contribute to custom-built supercomputer infrastructure optimizations
Document and share collective communication best practices across engineering teams
Participate in code reviews and technical discussions for networking primitives
Stay updated on latest advancements in HPC networking and AI training hardware

Benefits

general: Competitive salary with performance-based bonuses
general: Comprehensive health, dental, and vision insurance coverage
general: 401(k) retirement plan with generous company matching
general: Relocation assistance package for new employees
general: Hybrid work model with 3 days in office per week
general: Unlimited PTO and flexible vacation policy
general: Generous parental leave and family planning benefits
general: Professional development stipend for conferences and courses
general: Stock options and equity participation in OpenAI
general: On-site fitness facilities and wellness programs
general: Catered meals and fully stocked kitchens daily
general: Mental health support and employee assistance programs
general: Visa sponsorship for international talent
general: Cutting-edge hardware access for personal projects

Target Your Resume for "Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap