Resume and JobRESUME AND JOB
OpenAI logo

ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!

OpenAI

ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

ML Framework Engineer at OpenAI - San Francisco, CA

Join OpenAI's Training Runtime team as an ML Framework Engineer and become a key architect of the world's most advanced distributed machine learning training systems. Based in San Francisco, this role offers the rare opportunity to directly impact frontier-scale AI model training while enabling researchers to push the boundaries of artificial intelligence.

The Training Runtime team designs the core infrastructure powering everything from early research experiments to massive, multi-billion parameter model training runs. With a dual mandate to accelerate both researcher productivity and training throughput, we're building a unified, modular runtime that scales seamlessly from laptop experiments to supercomputer deployments.

As an ML Framework Engineer, you'll work on three critical pillars: high-performance data movement (asynchronous, zero-copy tensor transfers), fault-tolerant training frameworks (resilient checkpointing, state management), and distributed process orchestration. Your optimizations will directly translate to faster model training and more rapid AI breakthroughs.

Key Responsibilities

  • Performance Optimization: Profile and optimize our internal training framework to achieve state-of-the-art hardware efficiency across massive GPU clusters
  • Research Enablement: Partner directly with OpenAI researchers to implement cutting-edge training techniques and enable next-generation model development
  • Distributed Systems: Design and implement high-performance, asynchronous data movement systems with zero-copy tensor and optimizer-state transfers
  • Fault Tolerance: Build resilient training loops, state management systems, and checkpointing mechanisms that maintain uptime during long-running training jobs
  • Orchestration: Develop deterministic process orchestration for distributed training jobs spanning thousands of GPUs
  • Observability: Implement comprehensive monitoring and observability for training runs at any scale
  • Integration: Create composable interfaces that integrate proven large-scale capabilities with researcher-friendly APIs
  • Debugging: Write production-quality, bug-free machine learning code that powers mission-critical training infrastructure
  • Supercomputing: Deeply understand and optimize for supercomputer architectures and network topologies
  • Collaboration: Work cross-functionally with model-stack, research, and platform engineering teams
  • Innovation: Continuously identify and implement performance improvements while minimizing system complexity
  • Scale: Ensure training framework reliability from single-GPU experiments to frontier-scale deployments
  • Impact: Deliver measurable improvements in both training throughput (TFLOPS utilization) and researcher throughput (experiment velocity)

Qualifications & Requirements

This role demands exceptional engineering talent with deep systems knowledge and ML expertise. You might thrive if you:

  • Have hands-on experience running ML experiments, even at small scale
  • Obsess over performance optimization and system efficiency
  • Deeply understand distributed systems and their failure modes
  • Write clean, bug-free Python code under pressure
  • Love reverse-engineering complex systems to make them faster
  • Have worked with GPU-accelerated training frameworks (PyTorch/TensorFlow/JAX)
  • Understand supercomputer networking and interconnect topologies
  • Excel at profiling tools (NVIDIA Nsight, PyTorch Profiler, etc.)
  • Can balance performance gains with maintainable, simple designs
  • Collaborate effectively with researchers who move fast
  • Thrive in hybrid work environment (3 days/week in SF office)

Bonus: Experience with fault-tolerant distributed systems, RDMA networking, or large-scale ML training infrastructure.

Salary & Benefits

Competitive Compensation: Total compensation for ML Framework Engineers at OpenAI typically ranges from $250,000 - $450,000+ annually, including base salary, equity, and performance bonuses. Exact compensation depends on experience and location.

Comprehensive Benefits Package:

  • Hybrid work model: 3 days/week in San Francisco office
  • Full relocation assistance for new hires
  • Premium medical, dental, vision coverage
  • 401(k) with generous company match
  • Unlimited PTO with encouragement to disconnect
  • Parental leave and family planning benefits
  • Mental health support and wellness stipend
  • Professional development budget
  • Daily catered meals and fully stocked kitchens
  • Gym membership reimbursement
  • Latest hardware and GPU cluster access
  • Equity in OpenAI - share in our mission success

Why Join OpenAI?

OpenAI is at the forefront of artificial intelligence research and deployment. Our mission is to ensure AGI benefits all of humanity. By joining our Training Runtime team, you'll:

  • Work on Frontier Problems: Optimize training systems that power GPT models and beyond
  • Maximum Impact: Your code will run on thousands of GPUs training humanity's most important AI systems
  • Research Partnership: Collaborate directly with world-class AI researchers
  • Cutting-Edge Tech: Access to latest supercomputing hardware and ML frameworks
  • Mission-Driven Culture: Work with talented teammates united by our mission
  • San Francisco Hub: Join our collaborative headquarters with top AI talent

We're building safe AGI systems with human values at their core. Your work will shape the future of intelligence.

How to Apply

Ready to accelerate humanity's AI future? Submit your application including:

  • Resume/CV highlighting ML and systems experience
  • GitHub/portfolio with relevant projects
  • Brief note on your favorite performance optimization you've implemented

Application Process:

  1. Online application review (1-2 weeks)
  2. Technical phone screen
  3. Systems/ML coding assessment
  4. Team interviews with Training Runtime engineers
  5. Research collaboration exercise
  6. Final interviews with leadership

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Apply Now - ML Framework Engineer

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Machine Learning Frameworksintermediate
  • Distributed Systemsintermediate
  • Python Programmingintermediate
  • Performance Optimizationintermediate
  • GPU Programmingintermediate
  • TensorFlowintermediate
  • PyTorchintermediate
  • Deep Learningintermediate
  • Profiling Toolsintermediate
  • Supercomputer Architectureintermediate
  • Fault-Tolerant Systemsintermediate
  • Checkpointingintermediate
  • Data Movement Optimizationintermediate
  • AI Model Trainingintermediate
  • Software Engineeringintermediate
  • Debugging ML Codeintermediate
  • Asynchronous Programmingintermediate
  • Zero-Copy Data Transferintermediate
  • Deterministic Orchestrationintermediate
  • Observability Systemsintermediate

Required Qualifications

  • Experience running small-scale ML experiments (experience)
  • Strong passion for performance optimization (experience)
  • Deep understanding of distributed systems (experience)
  • Proficiency in Python programming (experience)
  • Excellent software engineering skills (experience)
  • Ability to write bug-free machine learning code (experience)
  • Knowledge of supercomputer performance characteristics (experience)
  • Experience with GPU-accelerated training (experience)
  • Familiarity with high-performance computing (experience)
  • Strong problem-solving skills for system bottlenecks (experience)
  • Experience profiling and optimizing ML frameworks (experience)
  • Comfortable working with large-scale distributed training (experience)
  • Collaborative mindset for working with researchers (experience)

Responsibilities

  • Apply latest ML training techniques to internal framework
  • Profile and optimize training framework performance
  • Collaborate with researchers on next-generation models
  • Design high-performance data movement systems
  • Implement fault-tolerant training loops
  • Develop resilient checkpointing mechanisms
  • Optimize tensor and optimizer-state data transfers
  • Build deterministic orchestration systems
  • Enhance observability for distributed training jobs
  • Manage distributed processes for long-running jobs
  • Integrate large-scale capabilities into developer runtime
  • Achieve impressive hardware efficiency in training runs
  • Debug and maintain bug-free ML training code
  • Partner with model-stack and platform teams

Benefits

  • general: Competitive salary with equity package
  • general: Hybrid work model (3 days in office)
  • general: Comprehensive relocation assistance
  • general: Medical, dental, and vision insurance
  • general: 401(k) matching program
  • general: Unlimited PTO policy
  • general: Mental health and wellness benefits
  • general: Professional development stipend
  • general: Parental leave benefits
  • general: Gym membership reimbursement
  • general: Catered meals and snacks daily
  • general: Cutting-edge hardware access
  • general: Work with world-class researchers
  • general: Impact frontier AI development
  • general: Collaborative, innovative culture

Target Your Resume for "ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

ML Framework Engineer OpenAIMachine Learning Engineer San FranciscoDistributed Training EngineerAI Training Framework JobsGPU Optimization EngineerOpenAI Careers San FranciscoPerformance Engineer Machine LearningSupercomputer ML TrainingFault Tolerant Training SystemsPython ML Engineer OpenAIDistributed Systems AI JobsTraining Runtime EngineerAI Research Engineering JobsFrontier Model Training CareersHigh Performance Computing MLOpenAI ML Infrastructure JobsTensor Data Movement OptimizationResilient Checkpointing EngineerDeterministic Training OrchestrationResearcher Enablement EngineerHybrid ML Engineer San FranciscoOpenAI Relocation JobsScaling

Answer 10 quick questions to check your fit for ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!

OpenAI

ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

ML Framework Engineer at OpenAI - San Francisco, CA

Join OpenAI's Training Runtime team as an ML Framework Engineer and become a key architect of the world's most advanced distributed machine learning training systems. Based in San Francisco, this role offers the rare opportunity to directly impact frontier-scale AI model training while enabling researchers to push the boundaries of artificial intelligence.

The Training Runtime team designs the core infrastructure powering everything from early research experiments to massive, multi-billion parameter model training runs. With a dual mandate to accelerate both researcher productivity and training throughput, we're building a unified, modular runtime that scales seamlessly from laptop experiments to supercomputer deployments.

As an ML Framework Engineer, you'll work on three critical pillars: high-performance data movement (asynchronous, zero-copy tensor transfers), fault-tolerant training frameworks (resilient checkpointing, state management), and distributed process orchestration. Your optimizations will directly translate to faster model training and more rapid AI breakthroughs.

Key Responsibilities

  • Performance Optimization: Profile and optimize our internal training framework to achieve state-of-the-art hardware efficiency across massive GPU clusters
  • Research Enablement: Partner directly with OpenAI researchers to implement cutting-edge training techniques and enable next-generation model development
  • Distributed Systems: Design and implement high-performance, asynchronous data movement systems with zero-copy tensor and optimizer-state transfers
  • Fault Tolerance: Build resilient training loops, state management systems, and checkpointing mechanisms that maintain uptime during long-running training jobs
  • Orchestration: Develop deterministic process orchestration for distributed training jobs spanning thousands of GPUs
  • Observability: Implement comprehensive monitoring and observability for training runs at any scale
  • Integration: Create composable interfaces that integrate proven large-scale capabilities with researcher-friendly APIs
  • Debugging: Write production-quality, bug-free machine learning code that powers mission-critical training infrastructure
  • Supercomputing: Deeply understand and optimize for supercomputer architectures and network topologies
  • Collaboration: Work cross-functionally with model-stack, research, and platform engineering teams
  • Innovation: Continuously identify and implement performance improvements while minimizing system complexity
  • Scale: Ensure training framework reliability from single-GPU experiments to frontier-scale deployments
  • Impact: Deliver measurable improvements in both training throughput (TFLOPS utilization) and researcher throughput (experiment velocity)

Qualifications & Requirements

This role demands exceptional engineering talent with deep systems knowledge and ML expertise. You might thrive if you:

  • Have hands-on experience running ML experiments, even at small scale
  • Obsess over performance optimization and system efficiency
  • Deeply understand distributed systems and their failure modes
  • Write clean, bug-free Python code under pressure
  • Love reverse-engineering complex systems to make them faster
  • Have worked with GPU-accelerated training frameworks (PyTorch/TensorFlow/JAX)
  • Understand supercomputer networking and interconnect topologies
  • Excel at profiling tools (NVIDIA Nsight, PyTorch Profiler, etc.)
  • Can balance performance gains with maintainable, simple designs
  • Collaborate effectively with researchers who move fast
  • Thrive in hybrid work environment (3 days/week in SF office)

Bonus: Experience with fault-tolerant distributed systems, RDMA networking, or large-scale ML training infrastructure.

Salary & Benefits

Competitive Compensation: Total compensation for ML Framework Engineers at OpenAI typically ranges from $250,000 - $450,000+ annually, including base salary, equity, and performance bonuses. Exact compensation depends on experience and location.

Comprehensive Benefits Package:

  • Hybrid work model: 3 days/week in San Francisco office
  • Full relocation assistance for new hires
  • Premium medical, dental, vision coverage
  • 401(k) with generous company match
  • Unlimited PTO with encouragement to disconnect
  • Parental leave and family planning benefits
  • Mental health support and wellness stipend
  • Professional development budget
  • Daily catered meals and fully stocked kitchens
  • Gym membership reimbursement
  • Latest hardware and GPU cluster access
  • Equity in OpenAI - share in our mission success

Why Join OpenAI?

OpenAI is at the forefront of artificial intelligence research and deployment. Our mission is to ensure AGI benefits all of humanity. By joining our Training Runtime team, you'll:

  • Work on Frontier Problems: Optimize training systems that power GPT models and beyond
  • Maximum Impact: Your code will run on thousands of GPUs training humanity's most important AI systems
  • Research Partnership: Collaborate directly with world-class AI researchers
  • Cutting-Edge Tech: Access to latest supercomputing hardware and ML frameworks
  • Mission-Driven Culture: Work with talented teammates united by our mission
  • San Francisco Hub: Join our collaborative headquarters with top AI talent

We're building safe AGI systems with human values at their core. Your work will shape the future of intelligence.

How to Apply

Ready to accelerate humanity's AI future? Submit your application including:

  • Resume/CV highlighting ML and systems experience
  • GitHub/portfolio with relevant projects
  • Brief note on your favorite performance optimization you've implemented

Application Process:

  1. Online application review (1-2 weeks)
  2. Technical phone screen
  3. Systems/ML coding assessment
  4. Team interviews with Training Runtime engineers
  5. Research collaboration exercise
  6. Final interviews with leadership

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Apply Now - ML Framework Engineer

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Machine Learning Frameworksintermediate
  • Distributed Systemsintermediate
  • Python Programmingintermediate
  • Performance Optimizationintermediate
  • GPU Programmingintermediate
  • TensorFlowintermediate
  • PyTorchintermediate
  • Deep Learningintermediate
  • Profiling Toolsintermediate
  • Supercomputer Architectureintermediate
  • Fault-Tolerant Systemsintermediate
  • Checkpointingintermediate
  • Data Movement Optimizationintermediate
  • AI Model Trainingintermediate
  • Software Engineeringintermediate
  • Debugging ML Codeintermediate
  • Asynchronous Programmingintermediate
  • Zero-Copy Data Transferintermediate
  • Deterministic Orchestrationintermediate
  • Observability Systemsintermediate

Required Qualifications

  • Experience running small-scale ML experiments (experience)
  • Strong passion for performance optimization (experience)
  • Deep understanding of distributed systems (experience)
  • Proficiency in Python programming (experience)
  • Excellent software engineering skills (experience)
  • Ability to write bug-free machine learning code (experience)
  • Knowledge of supercomputer performance characteristics (experience)
  • Experience with GPU-accelerated training (experience)
  • Familiarity with high-performance computing (experience)
  • Strong problem-solving skills for system bottlenecks (experience)
  • Experience profiling and optimizing ML frameworks (experience)
  • Comfortable working with large-scale distributed training (experience)
  • Collaborative mindset for working with researchers (experience)

Responsibilities

  • Apply latest ML training techniques to internal framework
  • Profile and optimize training framework performance
  • Collaborate with researchers on next-generation models
  • Design high-performance data movement systems
  • Implement fault-tolerant training loops
  • Develop resilient checkpointing mechanisms
  • Optimize tensor and optimizer-state data transfers
  • Build deterministic orchestration systems
  • Enhance observability for distributed training jobs
  • Manage distributed processes for long-running jobs
  • Integrate large-scale capabilities into developer runtime
  • Achieve impressive hardware efficiency in training runs
  • Debug and maintain bug-free ML training code
  • Partner with model-stack and platform teams

Benefits

  • general: Competitive salary with equity package
  • general: Hybrid work model (3 days in office)
  • general: Comprehensive relocation assistance
  • general: Medical, dental, and vision insurance
  • general: 401(k) matching program
  • general: Unlimited PTO policy
  • general: Mental health and wellness benefits
  • general: Professional development stipend
  • general: Parental leave benefits
  • general: Gym membership reimbursement
  • general: Catered meals and snacks daily
  • general: Cutting-edge hardware access
  • general: Work with world-class researchers
  • general: Impact frontier AI development
  • general: Collaborative, innovative culture

Target Your Resume for "ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

ML Framework Engineer OpenAIMachine Learning Engineer San FranciscoDistributed Training EngineerAI Training Framework JobsGPU Optimization EngineerOpenAI Careers San FranciscoPerformance Engineer Machine LearningSupercomputer ML TrainingFault Tolerant Training SystemsPython ML Engineer OpenAIDistributed Systems AI JobsTraining Runtime EngineerAI Research Engineering JobsFrontier Model Training CareersHigh Performance Computing MLOpenAI ML Infrastructure JobsTensor Data Movement OptimizationResilient Checkpointing EngineerDeterministic Training OrchestrationResearcher Enablement EngineerHybrid ML Engineer San FranciscoOpenAI Relocation JobsScaling

Answer 10 quick questions to check your fit for ML Framework Engineer Careers at OpenAI - San Francisco, CA | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.