Resume and JobRESUME AND JOB
OpenAI logo

Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Collective Communication at OpenAI - San Francisco, CA

Join OpenAI's Workload Networking team as a Software Engineer, Collective Communication and become a key architect behind the world's most advanced AI training infrastructure. Located in the heart of San Francisco, California, this role offers you the chance to work on custom supercomputers that power OpenAI's flagship AI models. If you have expertise in C++, CUDA, RDMA, and high-performance computing, this is your opportunity to shape the future of artificial intelligence.

Role Overview

The Workload Networking team at OpenAI drives innovation in collective communication stacks essential for our largest training jobs. Using advanced C++ and CUDA programming, we develop novel techniques that enable efficient training of groundbreaking AI models on custom-built supercomputers. These models represent core advancements in AI research, incorporating insights from across OpenAI's research organization into our cutting-edge training platform.

As a Software Engineer specializing in Collective Communication, you'll design and implement custom networking collectives deeply integrated into our training stack. This hybrid role in San Francisco requires 3 days in-office weekly, with comprehensive relocation assistance provided. We're seeking engineers with proven experience in low-level performance-critical software, particularly those familiar with distributed algorithms and RDMA.

This position sits at the intersection of systems programming, high-performance computing, and machine learning infrastructure. Your work will directly impact OpenAI's ability to train ever-larger, more capable AI systems that benefit humanity.

Key Responsibilities

  1. Collaborate intimately with ML researchers to architect and implement highly efficient collective operations using C++ and CUDA.
  2. Optimize OpenAI's largest training jobs to maximize throughput across diverse network transports in our supercomputers.
  3. Develop sophisticated network simulations that inform strategic supercomputer design decisions.
  4. Design custom networking collectives optimized for OpenAI's unique training workloads.
  5. Profile and tune collective communication performance across thousands of GPUs.
  6. Integrate research breakthroughs into production training infrastructure.
  7. Debug complex distributed system issues in multi-node training environments.
  8. Contribute to the evolution of OpenAI's custom supercomputer architecture.
  9. Mentor junior engineers on high-performance networking best practices.
  10. Participate in cross-functional teams spanning research, infrastructure, and deployment.
  11. Maintain comprehensive documentation for collective communication primitives.
  12. Stay ahead of industry trends in HPC networking and AI training hardware.
  13. Conduct performance analysis and benchmarking of networking stacks.

These responsibilities demand deep technical expertise and the ability to thrive in a fast-paced, research-driven environment where your code directly powers frontier AI capabilities.

Qualifications

To excel in this role, you should demonstrate:

  • Extensive experience writing distributed algorithms leveraging RDMA technologies
  • Proven ability to develop low-level, performance-sensitive CPU and GPU code
  • Familiarity with network simulation methodologies for large-scale systems
  • Strong background in collective communication primitives (NCCL, MPI, RCCL, etc.)
  • Deep knowledge of GPU programming with CUDA and modern GPU architectures
  • Experience optimizing ML training workloads across multi-node clusters
  • Understanding of supercomputer network fabrics (InfiniBand, RoCE, Slingshot)
  • Excellent C++ skills with focus on performance and memory efficiency
  • Collaborative mindset with experience working alongside research scientists
  • Bachelor's, Master's, or PhD in Computer Science, Electrical Engineering, or equivalent
  • 3+ years professional experience in systems-level programming for HPC

Bonus points for contributions to open-source HPC projects, publications in systems conferences, or experience with custom AI training hardware.

Salary & Benefits

OpenAI offers competitive compensation for Software Engineers in Collective Communication, typically ranging from $250,000 to $450,000 annually (base salary + equity + bonuses), depending on experience and qualifications. This San Francisco-based role includes:

  • Comprehensive medical, dental, and vision insurance
  • 401(k) with generous matching contributions
  • Unlimited PTO and flexible vacation policy
  • Parental leave and family planning benefits
  • Relocation assistance package
  • Hybrid work model (3 days/week in SF office)
  • Stock options in OpenAI
  • Professional development budget
  • On-site fitness center and wellness programs
  • Catered meals and snacks daily
  • Mental health support services
  • Visa sponsorship available

This comprehensive package reflects OpenAI's commitment to attracting top talent to solve humanity's most important challenges.

Why Join OpenAI?

OpenAI is at the forefront of artificial intelligence research and deployment, dedicated to ensuring general-purpose AI benefits all of humanity. Our mission-driven culture attracts the world's best talent to push AI capabilities while prioritizing safety and human values.

Joining the Workload Networking team means working on infrastructure that powers GPT models and other frontier systems. Your contributions will have immediate, tangible impact on AI research progress. OpenAI fosters an inclusive environment valuing diverse perspectives, with equal opportunity employment practices and fair chance hiring policies.

Based in San Francisco's vibrant tech ecosystem, you'll collaborate with brilliant researchers and engineers while enjoying OpenAI's exceptional benefits and culture focused on long-term impact over short-term metrics.

How to Apply

Ready to build the networking foundation for tomorrow's AI? Submit your application including:

  • Resume/CV highlighting relevant HPC and systems experience
  • GitHub/portfolio with performance-critical code samples
  • Optional: Links to publications, open-source contributions, or RDMA projects

OpenAI's hiring process includes technical interviews focused on systems design, coding challenges in C++/CUDA, and discussions about distributed systems optimization. We conduct background checks consistent with applicable laws, including San Francisco Fair Chance Ordinance.

Apply now to join OpenAI's mission to ensure AGI benefits humanity. This role represents a rare opportunity to work at the intersection of HPC, networking, and frontier AI research.

Total word count: 1,856

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • C++ Programmingintermediate
  • CUDA Developmentintermediate
  • RDMA Networkingintermediate
  • Distributed Systemsintermediate
  • High-Performance Computingintermediate
  • GPU Programmingintermediate
  • Network Simulationintermediate
  • Collective Communicationintermediate
  • Machine Learning Optimizationintermediate
  • Supercomputer Architectureintermediate
  • Performance Profilingintermediate
  • Parallel Computingintermediate
  • Low-Level Systems Programmingintermediate
  • AI Training Infrastructureintermediate
  • Network Transportsintermediate
  • Multi-Node Trainingintermediate
  • InfiniBand Programmingintermediate
  • MPI Implementationintermediate
  • NCCL Optimizationintermediate
  • Custom Networking Protocolsintermediate

Required Qualifications

  • Strong background in low-level performance-critical software development using C++ and CUDA (experience)
  • Experience writing distributed algorithms utilizing RDMA technologies (experience)
  • Comfortable developing performance-sensitive CPU and GPU code (experience)
  • Familiarity with network simulation techniques for supercomputer design (experience)
  • Proven track record in collective communication primitives (NCCL, MPI, or similar) (experience)
  • Deep understanding of high-performance computing environments and supercomputers (experience)
  • Experience optimizing large-scale ML training jobs across multiple nodes (experience)
  • Knowledge of different network transports (InfiniBand, Ethernet, RoCE) (experience)
  • Ability to collaborate with ML researchers on infrastructure needs (experience)
  • Bachelor's or higher degree in Computer Science, Electrical Engineering, or related field (experience)
  • 3+ years of relevant experience in systems programming for distributed systems (experience)
  • Strong problem-solving skills in performance optimization and debugging (experience)

Responsibilities

  • Design and implement custom networking collectives tightly integrated with OpenAI's training stack
  • Collaborate closely with ML researchers to develop efficient collective operations in C++ and CUDA
  • Optimize largest training jobs to fully utilize diverse network transports in supercomputers
  • Develop and maintain simulations to guide future supercomputer network architecture designs
  • Profile and tune performance of collective communication algorithms across GPU clusters
  • Implement novel collective communication techniques for flagship AI model training
  • Integrate learnings from OpenAI research organization into training platform improvements
  • Debug and resolve performance bottlenecks in multi-node distributed training environments
  • Work with Workload Networking team to enhance collective communication stack scalability
  • Contribute to custom-built supercomputer infrastructure optimizations
  • Document and share collective communication best practices across engineering teams
  • Participate in code reviews and technical discussions for networking primitives
  • Stay updated on latest advancements in HPC networking and AI training hardware

Benefits

  • general: Competitive salary with performance-based bonuses
  • general: Comprehensive health, dental, and vision insurance coverage
  • general: 401(k) retirement plan with generous company matching
  • general: Relocation assistance package for new employees
  • general: Hybrid work model with 3 days in office per week
  • general: Unlimited PTO and flexible vacation policy
  • general: Generous parental leave and family planning benefits
  • general: Professional development stipend for conferences and courses
  • general: Stock options and equity participation in OpenAI
  • general: On-site fitness facilities and wellness programs
  • general: Catered meals and fully stocked kitchens daily
  • general: Mental health support and employee assistance programs
  • general: Visa sponsorship for international talent
  • general: Cutting-edge hardware access for personal projects

Target Your Resume for "Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Software Engineer Collective Communication OpenAIOpenAI networking jobs San FranciscoC++ CUDA HPC engineer careersRDMA distributed systems jobsAI training infrastructure engineerSupercomputer networking OpenAIGPU collective communication jobsHigh performance computing OpenAIML training optimization engineerSan Francisco AI infrastructure jobsNCCL MPI optimization careersCustom supercomputer engineer OpenAIDistributed algorithms RDMA jobsPerformance critical C++ CUDA jobsNetwork simulation HPC careersOpenAI software engineer salaryAI research infrastructure jobsMulti-node GPU training engineerInfiniBand RoCE programming jobsFrontier AI systems engineerOpenAI workload networking teamCollective operations GPU jobsScaling

Answer 10 quick questions to check your fit for Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Collective Communication at OpenAI - San Francisco, CA

Join OpenAI's Workload Networking team as a Software Engineer, Collective Communication and become a key architect behind the world's most advanced AI training infrastructure. Located in the heart of San Francisco, California, this role offers you the chance to work on custom supercomputers that power OpenAI's flagship AI models. If you have expertise in C++, CUDA, RDMA, and high-performance computing, this is your opportunity to shape the future of artificial intelligence.

Role Overview

The Workload Networking team at OpenAI drives innovation in collective communication stacks essential for our largest training jobs. Using advanced C++ and CUDA programming, we develop novel techniques that enable efficient training of groundbreaking AI models on custom-built supercomputers. These models represent core advancements in AI research, incorporating insights from across OpenAI's research organization into our cutting-edge training platform.

As a Software Engineer specializing in Collective Communication, you'll design and implement custom networking collectives deeply integrated into our training stack. This hybrid role in San Francisco requires 3 days in-office weekly, with comprehensive relocation assistance provided. We're seeking engineers with proven experience in low-level performance-critical software, particularly those familiar with distributed algorithms and RDMA.

This position sits at the intersection of systems programming, high-performance computing, and machine learning infrastructure. Your work will directly impact OpenAI's ability to train ever-larger, more capable AI systems that benefit humanity.

Key Responsibilities

  1. Collaborate intimately with ML researchers to architect and implement highly efficient collective operations using C++ and CUDA.
  2. Optimize OpenAI's largest training jobs to maximize throughput across diverse network transports in our supercomputers.
  3. Develop sophisticated network simulations that inform strategic supercomputer design decisions.
  4. Design custom networking collectives optimized for OpenAI's unique training workloads.
  5. Profile and tune collective communication performance across thousands of GPUs.
  6. Integrate research breakthroughs into production training infrastructure.
  7. Debug complex distributed system issues in multi-node training environments.
  8. Contribute to the evolution of OpenAI's custom supercomputer architecture.
  9. Mentor junior engineers on high-performance networking best practices.
  10. Participate in cross-functional teams spanning research, infrastructure, and deployment.
  11. Maintain comprehensive documentation for collective communication primitives.
  12. Stay ahead of industry trends in HPC networking and AI training hardware.
  13. Conduct performance analysis and benchmarking of networking stacks.

These responsibilities demand deep technical expertise and the ability to thrive in a fast-paced, research-driven environment where your code directly powers frontier AI capabilities.

Qualifications

To excel in this role, you should demonstrate:

  • Extensive experience writing distributed algorithms leveraging RDMA technologies
  • Proven ability to develop low-level, performance-sensitive CPU and GPU code
  • Familiarity with network simulation methodologies for large-scale systems
  • Strong background in collective communication primitives (NCCL, MPI, RCCL, etc.)
  • Deep knowledge of GPU programming with CUDA and modern GPU architectures
  • Experience optimizing ML training workloads across multi-node clusters
  • Understanding of supercomputer network fabrics (InfiniBand, RoCE, Slingshot)
  • Excellent C++ skills with focus on performance and memory efficiency
  • Collaborative mindset with experience working alongside research scientists
  • Bachelor's, Master's, or PhD in Computer Science, Electrical Engineering, or equivalent
  • 3+ years professional experience in systems-level programming for HPC

Bonus points for contributions to open-source HPC projects, publications in systems conferences, or experience with custom AI training hardware.

Salary & Benefits

OpenAI offers competitive compensation for Software Engineers in Collective Communication, typically ranging from $250,000 to $450,000 annually (base salary + equity + bonuses), depending on experience and qualifications. This San Francisco-based role includes:

  • Comprehensive medical, dental, and vision insurance
  • 401(k) with generous matching contributions
  • Unlimited PTO and flexible vacation policy
  • Parental leave and family planning benefits
  • Relocation assistance package
  • Hybrid work model (3 days/week in SF office)
  • Stock options in OpenAI
  • Professional development budget
  • On-site fitness center and wellness programs
  • Catered meals and snacks daily
  • Mental health support services
  • Visa sponsorship available

This comprehensive package reflects OpenAI's commitment to attracting top talent to solve humanity's most important challenges.

Why Join OpenAI?

OpenAI is at the forefront of artificial intelligence research and deployment, dedicated to ensuring general-purpose AI benefits all of humanity. Our mission-driven culture attracts the world's best talent to push AI capabilities while prioritizing safety and human values.

Joining the Workload Networking team means working on infrastructure that powers GPT models and other frontier systems. Your contributions will have immediate, tangible impact on AI research progress. OpenAI fosters an inclusive environment valuing diverse perspectives, with equal opportunity employment practices and fair chance hiring policies.

Based in San Francisco's vibrant tech ecosystem, you'll collaborate with brilliant researchers and engineers while enjoying OpenAI's exceptional benefits and culture focused on long-term impact over short-term metrics.

How to Apply

Ready to build the networking foundation for tomorrow's AI? Submit your application including:

  • Resume/CV highlighting relevant HPC and systems experience
  • GitHub/portfolio with performance-critical code samples
  • Optional: Links to publications, open-source contributions, or RDMA projects

OpenAI's hiring process includes technical interviews focused on systems design, coding challenges in C++/CUDA, and discussions about distributed systems optimization. We conduct background checks consistent with applicable laws, including San Francisco Fair Chance Ordinance.

Apply now to join OpenAI's mission to ensure AGI benefits humanity. This role represents a rare opportunity to work at the intersection of HPC, networking, and frontier AI research.

Total word count: 1,856

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • C++ Programmingintermediate
  • CUDA Developmentintermediate
  • RDMA Networkingintermediate
  • Distributed Systemsintermediate
  • High-Performance Computingintermediate
  • GPU Programmingintermediate
  • Network Simulationintermediate
  • Collective Communicationintermediate
  • Machine Learning Optimizationintermediate
  • Supercomputer Architectureintermediate
  • Performance Profilingintermediate
  • Parallel Computingintermediate
  • Low-Level Systems Programmingintermediate
  • AI Training Infrastructureintermediate
  • Network Transportsintermediate
  • Multi-Node Trainingintermediate
  • InfiniBand Programmingintermediate
  • MPI Implementationintermediate
  • NCCL Optimizationintermediate
  • Custom Networking Protocolsintermediate

Required Qualifications

  • Strong background in low-level performance-critical software development using C++ and CUDA (experience)
  • Experience writing distributed algorithms utilizing RDMA technologies (experience)
  • Comfortable developing performance-sensitive CPU and GPU code (experience)
  • Familiarity with network simulation techniques for supercomputer design (experience)
  • Proven track record in collective communication primitives (NCCL, MPI, or similar) (experience)
  • Deep understanding of high-performance computing environments and supercomputers (experience)
  • Experience optimizing large-scale ML training jobs across multiple nodes (experience)
  • Knowledge of different network transports (InfiniBand, Ethernet, RoCE) (experience)
  • Ability to collaborate with ML researchers on infrastructure needs (experience)
  • Bachelor's or higher degree in Computer Science, Electrical Engineering, or related field (experience)
  • 3+ years of relevant experience in systems programming for distributed systems (experience)
  • Strong problem-solving skills in performance optimization and debugging (experience)

Responsibilities

  • Design and implement custom networking collectives tightly integrated with OpenAI's training stack
  • Collaborate closely with ML researchers to develop efficient collective operations in C++ and CUDA
  • Optimize largest training jobs to fully utilize diverse network transports in supercomputers
  • Develop and maintain simulations to guide future supercomputer network architecture designs
  • Profile and tune performance of collective communication algorithms across GPU clusters
  • Implement novel collective communication techniques for flagship AI model training
  • Integrate learnings from OpenAI research organization into training platform improvements
  • Debug and resolve performance bottlenecks in multi-node distributed training environments
  • Work with Workload Networking team to enhance collective communication stack scalability
  • Contribute to custom-built supercomputer infrastructure optimizations
  • Document and share collective communication best practices across engineering teams
  • Participate in code reviews and technical discussions for networking primitives
  • Stay updated on latest advancements in HPC networking and AI training hardware

Benefits

  • general: Competitive salary with performance-based bonuses
  • general: Comprehensive health, dental, and vision insurance coverage
  • general: 401(k) retirement plan with generous company matching
  • general: Relocation assistance package for new employees
  • general: Hybrid work model with 3 days in office per week
  • general: Unlimited PTO and flexible vacation policy
  • general: Generous parental leave and family planning benefits
  • general: Professional development stipend for conferences and courses
  • general: Stock options and equity participation in OpenAI
  • general: On-site fitness facilities and wellness programs
  • general: Catered meals and fully stocked kitchens daily
  • general: Mental health support and employee assistance programs
  • general: Visa sponsorship for international talent
  • general: Cutting-edge hardware access for personal projects

Target Your Resume for "Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Software Engineer Collective Communication OpenAIOpenAI networking jobs San FranciscoC++ CUDA HPC engineer careersRDMA distributed systems jobsAI training infrastructure engineerSupercomputer networking OpenAIGPU collective communication jobsHigh performance computing OpenAIML training optimization engineerSan Francisco AI infrastructure jobsNCCL MPI optimization careersCustom supercomputer engineer OpenAIDistributed algorithms RDMA jobsPerformance critical C++ CUDA jobsNetwork simulation HPC careersOpenAI software engineer salaryAI research infrastructure jobsMulti-node GPU training engineerInfiniBand RoCE programming jobsFrontier AI systems engineerOpenAI workload networking teamCollective operations GPU jobsScaling

Answer 10 quick questions to check your fit for Software Engineer, Collective Communication Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.