RESUME AND JOB

Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Distributed Systems at OpenAI - San Francisco, CA

Join OpenAI's Compute Runtime team building the foundational infrastructure powering the world's most advanced ML training systems. As a Software Engineer specializing in Distributed Systems, you'll architect the low-level frameworks that orchestrate thousands of computers moving petabytes of training data across our cutting-edge supercomputers.

This hybrid role in San Francisco offers relocation assistance and combines the excitement of AGI research with hands-on systems engineering at exascale. If you thrive on optimizing end-to-end performance while maintaining developer-friendly APIs, this is your opportunity to accelerate humanity's progress toward artificial general intelligence.

Key Responsibilities

Architect powerful, introspectable APIs orchestrating thousands of GPUs and CPUs for seamless data movement and persistence across distributed clusters
Profile and optimize high-performance I/O pipelines maximizing local storage throughput and network fabric utilization
Design scalable compute runtime components supporting dynamic ML training workloads from research prototypes to production supercomputers
Rapidly deploy training frameworks to bleeding-edge hardware architectures responding to evolving model requirements
Build robust Python and Rust components ensuring stability when scaling from hundreds to tens of thousands of nodes
Collaborate with ML researchers to understand data movement patterns and optimize for researcher productivity
Implement observability and monitoring systems providing real-time insights into cluster-wide performance bottlenecks
Create fault-tolerant designs automatically recovering from hardware failures across massive distributed systems
Optimize end-to-end training stack minimizing job completion times while maintaining system reliability
Develop debugging tools enabling rapid iteration cycles even at supercomputer scale
Contribute to open-source components advancing the state of distributed ML infrastructure
Mentor junior engineers while leading critical infrastructure initiatives

Qualifications & Requirements

Technical Expertise:

5+ years experience building production distributed systems handling massive scale
Deep proficiency in Python and Rust (or C++/Go) for systems programming
Proven track record profiling/optimizing I/O-bound and compute-bound workloads
Hands-on experience deploying to GPU clusters or supercomputing environments

Mindset & Approach:

Obsessive focus on end-to-end system performance and developer experience
Thrives in ambiguity with rapidly changing requirements and hardware stacks
Natural instinct for simplifying complex systems without sacrificing capability
Strong ownership mentality delivering reliable systems at exascale

Bonus Points:

Experience with ML training infrastructure (PyTorch, JAX, etc.)
Supercomputing center experience (LLNL, NERSC, etc.)
Contributions to distributed systems frameworks (Ray, Dask, etc.)

Salary & Benefits

Compensation: $250,000 - $450,000 base + equity + benefits (San Francisco market rate)

Comprehensive Benefits Package:

Top-tier medical, dental, vision coverage with low employee premiums
401(k) with 4%+ employer match
Unlimited PTO with wellness days
16 weeks paid parental leave (primary/secondary caregivers)
$2,000+ annual learning stipend
Fitness reimbursement up to $100/month
Daily catered meals in SF office
Relocation package including housing and moving support
Commuter benefits and subsidized housing options

Hybrid schedule: 3 days/week in our state-of-the-art San Francisco headquarters.

Why Join OpenAI?

OpenAI isn't just building AI products—we're creating the infrastructure that makes AGI possible. Our Compute Runtime team sits at the foundation, enabling researchers to push boundaries without infrastructure bottlenecks.

Impact at Scale: Your optimizations directly accelerate humanity's path to AGI

Cutting-Edge Challenges: Work with hardware and scale most engineers never see

Research Synergy: Partner directly with world-class ML researchers

Team Culture: Collaborative, mission-driven engineers who move fast and ship reliable systems

How to Apply

Submit your resume and a brief note explaining why you're excited about distributed systems at OpenAI scale. Tell us about your favorite optimization you've implemented and the impact it had.

Timeline: Rolling applications with interviews typically within 1-2 weeks

Process: Technical screen → Systems coding → Distributed systems deep dive → Team match

Apply Now

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Distributed Systemsintermediate
Python Programmingintermediate
Rust Programmingintermediate
High-Performance Computingintermediate
System Optimizationintermediate
Profiling Toolsintermediate
Scalable APIsintermediate
Data Persistenceintermediate
I/O Optimizationintermediate
ML Training Frameworksintermediate
Supercomputer Deploymentintermediate
Software Engineeringintermediate
Performance Debuggingintermediate
Large-Scale Systemsintermediate
Hybrid Cloud Infrastructureintermediate
Real-Time Monitoringintermediate
Fault-Tolerant Designintermediate
Concurrency Programmingintermediate

Required Qualifications

Experience building large-scale distributed systems handling thousands of nodes (experience)
Proficiency in Python and Rust programming languages or equivalent high-performance languages (experience)
Strong background in profiling and optimizing compute and data pipelines for scale (experience)
Hands-on experience with high-performance I/O systems and network optimization (experience)
Demonstrated ability to design introspectable APIs for fast debugging and development (experience)
Track record of deploying software to supercomputers or massive GPU clusters (experience)
Passion for end-to-end system optimization minimizing complexity and maintenance (experience)
Experience working in fast-paced environments with rapidly evolving requirements (experience)
Deep understanding of ML training system architectures and data movement patterns (experience)
Strong problem-solving skills for maintaining stability at exascale computing levels (experience)
Familiarity with hybrid work models and ability to collaborate 3 days/week in office (experience)
Bachelor's or higher degree in Computer Science, Engineering, or related field preferred (experience)

Responsibilities

Develop powerful APIs orchestrating thousands of computers for data movement and persistence
Design easy-to-use, introspectable systems promoting fast debugging and development cycles
Profile and optimize compute and data capabilities across local and distributed environments
Deploy training frameworks to latest supercomputers responding to evolving ML needs
Work across Python and Rust stack to build robust, scalable framework components
Maximize researcher productivity through high-performance system components
Optimize end-to-end systems from high-performance I/O to supercomputer-scale distribution
Maintain system stability and performance when scaling to newest hardware architectures
Collaborate with ML researchers to understand and address dynamic training requirements
Implement fault-tolerant designs ensuring reliability across massive distributed clusters
Create monitoring and observability tools for real-time system performance insights
Contribute to low-level framework components powering OpenAI's AGI training infrastructure
Rapidly iterate on systems based on hardware advancements and workload changes

Benefits

general: Competitive salary with equity package in leading AI company
general: Comprehensive health, dental, and vision insurance coverage
general: 401(k) retirement plan with generous company matching
general: Relocation assistance for new employees moving to San Francisco
general: Hybrid work model: 3 days in office, 2 days remote flexibility
general: Unlimited PTO with encouragement to disconnect and recharge
general: Generous parental leave policies for primary and secondary caregivers
general: Fitness reimbursement and wellness program benefits
general: Catered lunches and dinners daily in San Francisco office
general: Learning stipend for conferences, courses, and professional development
general: Mental health support through comprehensive EAP program
general: Commuter benefits and subsidized public transportation passes
general: Team offsites and social events throughout the year
general: Cutting-edge hardware access including latest GPU supercomputers

Target Your Resume for "Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Software Engineer Distributed Systems OpenAIDistributed Systems Engineer San FranciscoML Training Infrastructure JobsPython Rust Systems ProgrammingSupercomputer Software EngineerHigh Performance Computing CareersAGI Infrastructure EngineerOpenAI Software Engineering JobsDistributed ML Systems OpenAIGPU Cluster Optimization JobsSan Francisco Tech Jobs OpenAIScalable Systems EngineerPerformance Optimization EngineerExascale Computing JobsAI Research Infrastructure CareersRust Python Distributed SystemsML Training Framework DeveloperOpenAI Compute Runtime TeamSupercomputing Software JobsDistributed Systems AGI ResearchHigh Performance I/O EngineerSan Francisco AI Jobs 2024Scaling

Answer 10 quick questions to check your fit for Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Distributed Systems at OpenAI - San Francisco, CA

Key Responsibilities

Architect powerful, introspectable APIs orchestrating thousands of GPUs and CPUs for seamless data movement and persistence across distributed clusters
Profile and optimize high-performance I/O pipelines maximizing local storage throughput and network fabric utilization
Design scalable compute runtime components supporting dynamic ML training workloads from research prototypes to production supercomputers
Rapidly deploy training frameworks to bleeding-edge hardware architectures responding to evolving model requirements
Build robust Python and Rust components ensuring stability when scaling from hundreds to tens of thousands of nodes
Collaborate with ML researchers to understand data movement patterns and optimize for researcher productivity
Implement observability and monitoring systems providing real-time insights into cluster-wide performance bottlenecks
Create fault-tolerant designs automatically recovering from hardware failures across massive distributed systems
Optimize end-to-end training stack minimizing job completion times while maintaining system reliability
Develop debugging tools enabling rapid iteration cycles even at supercomputer scale
Contribute to open-source components advancing the state of distributed ML infrastructure
Mentor junior engineers while leading critical infrastructure initiatives

Qualifications & Requirements

Technical Expertise:

5+ years experience building production distributed systems handling massive scale
Deep proficiency in Python and Rust (or C++/Go) for systems programming
Proven track record profiling/optimizing I/O-bound and compute-bound workloads
Hands-on experience deploying to GPU clusters or supercomputing environments

Mindset & Approach:

Obsessive focus on end-to-end system performance and developer experience
Thrives in ambiguity with rapidly changing requirements and hardware stacks
Natural instinct for simplifying complex systems without sacrificing capability
Strong ownership mentality delivering reliable systems at exascale

Bonus Points:

Experience with ML training infrastructure (PyTorch, JAX, etc.)
Supercomputing center experience (LLNL, NERSC, etc.)
Contributions to distributed systems frameworks (Ray, Dask, etc.)

Salary & Benefits

Compensation: $250,000 - $450,000 base + equity + benefits (San Francisco market rate)

Comprehensive Benefits Package:

Top-tier medical, dental, vision coverage with low employee premiums
401(k) with 4%+ employer match
Unlimited PTO with wellness days
16 weeks paid parental leave (primary/secondary caregivers)
$2,000+ annual learning stipend
Fitness reimbursement up to $100/month
Daily catered meals in SF office
Relocation package including housing and moving support
Commuter benefits and subsidized housing options

Hybrid schedule: 3 days/week in our state-of-the-art San Francisco headquarters.

Why Join OpenAI?

Impact at Scale: Your optimizations directly accelerate humanity's path to AGI

Cutting-Edge Challenges: Work with hardware and scale most engineers never see

Research Synergy: Partner directly with world-class ML researchers

Team Culture: Collaborative, mission-driven engineers who move fast and ship reliable systems

How to Apply

Submit your resume and a brief note explaining why you're excited about distributed systems at OpenAI scale. Tell us about your favorite optimization you've implemented and the impact it had.

Timeline: Rolling applications with interviews typically within 1-2 weeks

Process: Technical screen → Systems coding → Distributed systems deep dive → Team match

Apply Now

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Distributed Systemsintermediate
Python Programmingintermediate
Rust Programmingintermediate
High-Performance Computingintermediate
System Optimizationintermediate
Profiling Toolsintermediate
Scalable APIsintermediate
Data Persistenceintermediate
I/O Optimizationintermediate
ML Training Frameworksintermediate
Supercomputer Deploymentintermediate
Software Engineeringintermediate
Performance Debuggingintermediate
Large-Scale Systemsintermediate
Hybrid Cloud Infrastructureintermediate
Real-Time Monitoringintermediate
Fault-Tolerant Designintermediate
Concurrency Programmingintermediate

Required Qualifications

Experience building large-scale distributed systems handling thousands of nodes (experience)
Proficiency in Python and Rust programming languages or equivalent high-performance languages (experience)
Strong background in profiling and optimizing compute and data pipelines for scale (experience)
Hands-on experience with high-performance I/O systems and network optimization (experience)
Demonstrated ability to design introspectable APIs for fast debugging and development (experience)
Track record of deploying software to supercomputers or massive GPU clusters (experience)
Passion for end-to-end system optimization minimizing complexity and maintenance (experience)
Experience working in fast-paced environments with rapidly evolving requirements (experience)
Deep understanding of ML training system architectures and data movement patterns (experience)
Strong problem-solving skills for maintaining stability at exascale computing levels (experience)
Familiarity with hybrid work models and ability to collaborate 3 days/week in office (experience)
Bachelor's or higher degree in Computer Science, Engineering, or related field preferred (experience)

Responsibilities

Develop powerful APIs orchestrating thousands of computers for data movement and persistence
Design easy-to-use, introspectable systems promoting fast debugging and development cycles
Profile and optimize compute and data capabilities across local and distributed environments
Deploy training frameworks to latest supercomputers responding to evolving ML needs
Work across Python and Rust stack to build robust, scalable framework components
Maximize researcher productivity through high-performance system components
Optimize end-to-end systems from high-performance I/O to supercomputer-scale distribution
Maintain system stability and performance when scaling to newest hardware architectures
Collaborate with ML researchers to understand and address dynamic training requirements
Implement fault-tolerant designs ensuring reliability across massive distributed clusters
Create monitoring and observability tools for real-time system performance insights
Contribute to low-level framework components powering OpenAI's AGI training infrastructure
Rapidly iterate on systems based on hardware advancements and workload changes

Benefits

general: Competitive salary with equity package in leading AI company
general: Comprehensive health, dental, and vision insurance coverage
general: 401(k) retirement plan with generous company matching
general: Relocation assistance for new employees moving to San Francisco
general: Hybrid work model: 3 days in office, 2 days remote flexibility
general: Unlimited PTO with encouragement to disconnect and recharge
general: Generous parental leave policies for primary and secondary caregivers
general: Fitness reimbursement and wellness program benefits
general: Catered lunches and dinners daily in San Francisco office
general: Learning stipend for conferences, courses, and professional development
general: Mental health support through comprehensive EAP program
general: Commuter benefits and subsidized public transportation passes
general: Team offsites and social events throughout the year
general: Cutting-edge hardware access including latest GPU supercomputers

Target Your Resume for "Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Software Engineer, Distributed Systems Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap