RESUME AND JOB

ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!

OpenAI

ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

ML Framework Engineer at OpenAI - San Francisco, CA

Join OpenAI's Training Runtime team as an ML Framework Engineer and become a key architect of the world's most advanced distributed machine learning training systems. Based in San Francisco, this role offers the rare opportunity to directly impact frontier-scale AI model training while enabling researchers to push the boundaries of artificial intelligence.

The Training Runtime team designs the core infrastructure powering everything from early research experiments to massive, multi-billion parameter model training runs. With a dual mandate to accelerate both researcher productivity and training throughput, we're building a unified, modular runtime that scales seamlessly from laptop experiments to supercomputer deployments.

As an ML Framework Engineer, you'll work on three critical pillars: high-performance data movement (asynchronous, zero-copy tensor transfers), fault-tolerant training frameworks (resilient checkpointing, state management), and distributed process orchestration. Your optimizations will directly translate to faster model training and more rapid AI breakthroughs.

Key Responsibilities

Performance Optimization: Profile and optimize our internal training framework to achieve state-of-the-art hardware efficiency across massive GPU clusters
Research Enablement: Partner directly with OpenAI researchers to implement cutting-edge training techniques and enable next-generation model development
Distributed Systems: Design and implement high-performance, asynchronous data movement systems with zero-copy tensor and optimizer-state transfers
Fault Tolerance: Build resilient training loops, state management systems, and checkpointing mechanisms that maintain uptime during long-running training jobs
Orchestration: Develop deterministic process orchestration for distributed training jobs spanning thousands of GPUs
Observability: Implement comprehensive monitoring and observability for training runs at any scale
Integration: Create composable interfaces that integrate proven large-scale capabilities with researcher-friendly APIs
Debugging: Write production-quality, bug-free machine learning code that powers mission-critical training infrastructure
Supercomputing: Deeply understand and optimize for supercomputer architectures and network topologies
Collaboration: Work cross-functionally with model-stack, research, and platform engineering teams
Innovation: Continuously identify and implement performance improvements while minimizing system complexity
Scale: Ensure training framework reliability from single-GPU experiments to frontier-scale deployments
Impact: Deliver measurable improvements in both training throughput (TFLOPS utilization) and researcher throughput (experiment velocity)

Qualifications & Requirements

This role demands exceptional engineering talent with deep systems knowledge and ML expertise. You might thrive if you:

Have hands-on experience running ML experiments, even at small scale
Obsess over performance optimization and system efficiency
Deeply understand distributed systems and their failure modes
Write clean, bug-free Python code under pressure
Love reverse-engineering complex systems to make them faster
Have worked with GPU-accelerated training frameworks (PyTorch/TensorFlow/JAX)
Understand supercomputer networking and interconnect topologies
Excel at profiling tools (NVIDIA Nsight, PyTorch Profiler, etc.)
Can balance performance gains with maintainable, simple designs
Collaborate effectively with researchers who move fast
Thrive in hybrid work environment (3 days/week in SF office)

Bonus: Experience with fault-tolerant distributed systems, RDMA networking, or large-scale ML training infrastructure.

Salary & Benefits

Competitive Compensation: Total compensation for ML Framework Engineers at OpenAI typically ranges from $250,000 - $450,000+ annually, including base salary, equity, and performance bonuses. Exact compensation depends on experience and location.

Comprehensive Benefits Package:

Hybrid work model: 3 days/week in San Francisco office
Full relocation assistance for new hires
Premium medical, dental, vision coverage
401(k) with generous company match
Unlimited PTO with encouragement to disconnect
Parental leave and family planning benefits
Mental health support and wellness stipend
Professional development budget
Daily catered meals and fully stocked kitchens
Gym membership reimbursement
Latest hardware and GPU cluster access
Equity in OpenAI - share in our mission success

Why Join OpenAI?

OpenAI is at the forefront of artificial intelligence research and deployment. Our mission is to ensure AGI benefits all of humanity. By joining our Training Runtime team, you'll:

Work on Frontier Problems: Optimize training systems that power GPT models and beyond
Maximum Impact: Your code will run on thousands of GPUs training humanity's most important AI systems
Research Partnership: Collaborate directly with world-class AI researchers
Cutting-Edge Tech: Access to latest supercomputing hardware and ML frameworks
Mission-Driven Culture: Work with talented teammates united by our mission
San Francisco Hub: Join our collaborative headquarters with top AI talent

We're building safe AGI systems with human values at their core. Your work will shape the future of intelligence.

How to Apply

Ready to accelerate humanity's AI future? Submit your application including:

Resume/CV highlighting ML and systems experience
GitHub/portfolio with relevant projects
Brief note on your favorite performance optimization you've implemented

Application Process:

Online application review (1-2 weeks)
Technical phone screen
Systems/ML coding assessment
Team interviews with Training Runtime engineers
Research collaboration exercise
Final interviews with leadership

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Apply Now - ML Framework Engineer

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Machine Learning Frameworksintermediate
Distributed Systemsintermediate
Python Programmingintermediate
Performance Optimizationintermediate
GPU Programmingintermediate
TensorFlowintermediate
PyTorchintermediate
Deep Learningintermediate
Profiling Toolsintermediate
Supercomputer Architectureintermediate
Fault-Tolerant Systemsintermediate
Checkpointingintermediate
Data Movement Optimizationintermediate
AI Model Trainingintermediate
Software Engineeringintermediate
Debugging ML Codeintermediate
Asynchronous Programmingintermediate
Zero-Copy Data Transferintermediate
Deterministic Orchestrationintermediate
Observability Systemsintermediate

Required Qualifications

Experience running small-scale ML experiments (experience)
Strong passion for performance optimization (experience)
Deep understanding of distributed systems (experience)
Proficiency in Python programming (experience)
Excellent software engineering skills (experience)
Ability to write bug-free machine learning code (experience)
Knowledge of supercomputer performance characteristics (experience)
Experience with GPU-accelerated training (experience)
Familiarity with high-performance computing (experience)
Strong problem-solving skills for system bottlenecks (experience)
Experience profiling and optimizing ML frameworks (experience)
Comfortable working with large-scale distributed training (experience)
Collaborative mindset for working with researchers (experience)

Responsibilities

Apply latest ML training techniques to internal framework
Profile and optimize training framework performance
Collaborate with researchers on next-generation models
Design high-performance data movement systems
Implement fault-tolerant training loops
Develop resilient checkpointing mechanisms
Optimize tensor and optimizer-state data transfers
Build deterministic orchestration systems
Enhance observability for distributed training jobs
Manage distributed processes for long-running jobs
Integrate large-scale capabilities into developer runtime
Achieve impressive hardware efficiency in training runs
Debug and maintain bug-free ML training code
Partner with model-stack and platform teams

Benefits

general: Competitive salary with equity package
general: Hybrid work model (3 days in office)
general: Comprehensive relocation assistance
general: Medical, dental, and vision insurance
general: 401(k) matching program
general: Unlimited PTO policy
general: Mental health and wellness benefits
general: Professional development stipend
general: Parental leave benefits
general: Gym membership reimbursement
general: Catered meals and snacks daily
general: Cutting-edge hardware access
general: Work with world-class researchers
general: Impact frontier AI development
general: Collaborative, innovative culture

Target Your Resume for "ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

ML Framework Engineer OpenAIMachine Learning Engineer San FranciscoDistributed Training EngineerAI Training Framework JobsGPU Optimization EngineerOpenAI Careers San FranciscoPerformance Engineer Machine LearningSupercomputer ML TrainingFault Tolerant Training SystemsPython ML Engineer OpenAIDistributed Systems AI JobsTraining Runtime EngineerAI Research Engineering JobsFrontier Model Training CareersHigh Performance Computing MLOpenAI ML Infrastructure JobsTensor Data Movement OptimizationResilient Checkpointing EngineerDeterministic Training OrchestrationResearcher Enablement EngineerHybrid ML Engineer San FranciscoOpenAI Relocation JobsScaling

Answer 10 quick questions to check your fit for ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!

OpenAI

ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

ML Framework Engineer at OpenAI - San Francisco, CA

Key Responsibilities

Performance Optimization: Profile and optimize our internal training framework to achieve state-of-the-art hardware efficiency across massive GPU clusters
Research Enablement: Partner directly with OpenAI researchers to implement cutting-edge training techniques and enable next-generation model development
Distributed Systems: Design and implement high-performance, asynchronous data movement systems with zero-copy tensor and optimizer-state transfers
Fault Tolerance: Build resilient training loops, state management systems, and checkpointing mechanisms that maintain uptime during long-running training jobs
Orchestration: Develop deterministic process orchestration for distributed training jobs spanning thousands of GPUs
Observability: Implement comprehensive monitoring and observability for training runs at any scale
Integration: Create composable interfaces that integrate proven large-scale capabilities with researcher-friendly APIs
Debugging: Write production-quality, bug-free machine learning code that powers mission-critical training infrastructure
Supercomputing: Deeply understand and optimize for supercomputer architectures and network topologies
Collaboration: Work cross-functionally with model-stack, research, and platform engineering teams
Innovation: Continuously identify and implement performance improvements while minimizing system complexity
Scale: Ensure training framework reliability from single-GPU experiments to frontier-scale deployments
Impact: Deliver measurable improvements in both training throughput (TFLOPS utilization) and researcher throughput (experiment velocity)

Qualifications & Requirements

This role demands exceptional engineering talent with deep systems knowledge and ML expertise. You might thrive if you:

Have hands-on experience running ML experiments, even at small scale
Obsess over performance optimization and system efficiency
Deeply understand distributed systems and their failure modes
Write clean, bug-free Python code under pressure
Love reverse-engineering complex systems to make them faster
Have worked with GPU-accelerated training frameworks (PyTorch/TensorFlow/JAX)
Understand supercomputer networking and interconnect topologies
Excel at profiling tools (NVIDIA Nsight, PyTorch Profiler, etc.)
Can balance performance gains with maintainable, simple designs
Collaborate effectively with researchers who move fast
Thrive in hybrid work environment (3 days/week in SF office)

Bonus: Experience with fault-tolerant distributed systems, RDMA networking, or large-scale ML training infrastructure.

Salary & Benefits

Comprehensive Benefits Package:

Hybrid work model: 3 days/week in San Francisco office
Full relocation assistance for new hires
Premium medical, dental, vision coverage
401(k) with generous company match
Unlimited PTO with encouragement to disconnect
Parental leave and family planning benefits
Mental health support and wellness stipend
Professional development budget
Daily catered meals and fully stocked kitchens
Gym membership reimbursement
Latest hardware and GPU cluster access
Equity in OpenAI - share in our mission success

Why Join OpenAI?

OpenAI is at the forefront of artificial intelligence research and deployment. Our mission is to ensure AGI benefits all of humanity. By joining our Training Runtime team, you'll:

Work on Frontier Problems: Optimize training systems that power GPT models and beyond
Maximum Impact: Your code will run on thousands of GPUs training humanity's most important AI systems
Research Partnership: Collaborate directly with world-class AI researchers
Cutting-Edge Tech: Access to latest supercomputing hardware and ML frameworks
Mission-Driven Culture: Work with talented teammates united by our mission
San Francisco Hub: Join our collaborative headquarters with top AI talent

We're building safe AGI systems with human values at their core. Your work will shape the future of intelligence.

How to Apply

Ready to accelerate humanity's AI future? Submit your application including:

Resume/CV highlighting ML and systems experience
GitHub/portfolio with relevant projects
Brief note on your favorite performance optimization you've implemented

Application Process:

Online application review (1-2 weeks)
Technical phone screen
Systems/ML coding assessment
Team interviews with Training Runtime engineers
Research collaboration exercise
Final interviews with leadership

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Apply Now - ML Framework Engineer

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Machine Learning Frameworksintermediate
Distributed Systemsintermediate
Python Programmingintermediate
Performance Optimizationintermediate
GPU Programmingintermediate
TensorFlowintermediate
PyTorchintermediate
Deep Learningintermediate
Profiling Toolsintermediate
Supercomputer Architectureintermediate
Fault-Tolerant Systemsintermediate
Checkpointingintermediate
Data Movement Optimizationintermediate
AI Model Trainingintermediate
Software Engineeringintermediate
Debugging ML Codeintermediate
Asynchronous Programmingintermediate
Zero-Copy Data Transferintermediate
Deterministic Orchestrationintermediate
Observability Systemsintermediate

Required Qualifications

Experience running small-scale ML experiments (experience)
Strong passion for performance optimization (experience)
Deep understanding of distributed systems (experience)
Proficiency in Python programming (experience)
Excellent software engineering skills (experience)
Ability to write bug-free machine learning code (experience)
Knowledge of supercomputer performance characteristics (experience)
Experience with GPU-accelerated training (experience)
Familiarity with high-performance computing (experience)
Strong problem-solving skills for system bottlenecks (experience)
Experience profiling and optimizing ML frameworks (experience)
Comfortable working with large-scale distributed training (experience)
Collaborative mindset for working with researchers (experience)

Responsibilities

Apply latest ML training techniques to internal framework
Profile and optimize training framework performance
Collaborate with researchers on next-generation models
Design high-performance data movement systems
Implement fault-tolerant training loops
Develop resilient checkpointing mechanisms
Optimize tensor and optimizer-state data transfers
Build deterministic orchestration systems
Enhance observability for distributed training jobs
Manage distributed processes for long-running jobs
Integrate large-scale capabilities into developer runtime
Achieve impressive hardware efficiency in training runs
Debug and maintain bug-free ML training code
Partner with model-stack and platform teams

Benefits

general: Competitive salary with equity package
general: Hybrid work model (3 days in office)
general: Comprehensive relocation assistance
general: Medical, dental, and vision insurance
general: 401(k) matching program
general: Unlimited PTO policy
general: Mental health and wellness benefits
general: Professional development stipend
general: Parental leave benefits
general: Gym membership reimbursement
general: Catered meals and snacks daily
general: Cutting-edge hardware access
general: Work with world-class researchers
general: Impact frontier AI development
general: Collaborative, innovative culture

Target Your Resume for "ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap