Resume and JobRESUME AND JOB
OpenAI logo

Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Distributed Training Engineer, Sora at OpenAI - San Francisco, CA

Role Overview

Join OpenAI's groundbreaking Sora team as a Distributed Training Engineer in San Francisco, California. The Sora team is at the forefront of revolutionizing video generation through advanced foundation models. As a hybrid research and product team, we push the boundaries of AI capabilities while prioritizing reliability, safety, and real-world deployment. In this critical role, you'll optimize training throughput for our internal frameworks, enabling researchers to experiment boldly with video models that could transform industries like entertainment, education, and beyond.

This position demands a rare blend of systems engineering prowess, ML expertise, and an unrelenting passion for performance optimization. You'll dive deep into supercomputer architectures, craft bug-free ML code, and collaborate directly with world-class researchers. Based in San Francisco with a hybrid model (3 days in-office weekly), OpenAI offers relocation support to bring top talent to our innovative hub. If you thrive on eliminating inefficiencies and scaling AI to unprecedented levels, this is your opportunity to shape the future of generative video AI.

OpenAI's mission to ensure AGI benefits humanity starts with teams like Sora, where engineering excellence meets cutting-edge research. Contribute to models that generate coherent, high-fidelity videos from text prompts, powering applications from creative tools to scientific simulations.

Key Responsibilities

  • Partner with researchers to architect systems-efficient video models, focusing on distributed training scalability.
  • Implement state-of-the-art techniques in OpenAI's internal training framework to maximize hardware utilization.
  • Conduct in-depth profiling of training pipelines, identifying and resolving throughput bottlenecks.
  • Optimize training kernels for GPU/TPU clusters, achieving impressive efficiency gains on supercomputers.
  • Develop robust Python codebases that enable seamless experimentation with novel video architectures.
  • Analyze and stabilize training dynamics to ensure reliable convergence in multi-modal models.
  • Debug complex distributed systems issues in large-scale ML runs, maintaining zero-bug standards.
  • Collaborate on deploying video models safely into production environments for widespread impact.
  • Monitor supercomputer performance metrics and propose hardware-software co-optimizations.
  • Document optimizations and mentor junior engineers on best practices in ML systems.
  • Stay abreast of advancements in parallel computing, JAX/PyTorch ecosystems, and AI hardware.
  • Contribute to safety features that mitigate risks in generative video training pipelines.
  • Participate in cross-team initiatives to integrate Sora capabilities into OpenAI's broader product suite.
  • Drive continuous improvements in training framework maintainability and researcher productivity.

Qualifications

To excel as a Distributed Training Engineer on the Sora team, you'll need:

  • Extensive experience with multi-modal ML pipelines, particularly video and text-to-video systems.
  • Deep systems knowledge for optimizing distributed training at supercomputer scale.
  • Expert-level Python proficiency, with a portfolio of production ML code.
  • Proven success in kernel optimization and performance tuning for AI workloads.
  • A passion for dissecting training dynamics and ensuring numerical stability.
  • Strong collaboration skills with research teams translating ideas to efficient implementations.
  • Familiarity with frameworks like PyTorch Distributed, JAX, or custom training stacks.
  • Experience with GPU cluster management, profiling tools (e.g., NVIDIA Nsight, PyTorch Profiler).
  • Bachelor's/Master's/PhD in CS, EE, or equivalent, plus 3+ years in ML systems engineering.
  • Relentless focus on code quality, performance, and scalability.

Salary & Benefits

Compensation at OpenAI is competitive and includes base salary, equity, and comprehensive benefits. Estimated total compensation for this senior role ranges from $250,000 to $450,000 USD annually, depending on experience. Benefits include health coverage, 401(k) matching, unlimited PTO, parental leave, relocation assistance, and access to state-of-the-art compute resources.

  • Full medical, dental, vision insurance
  • Hybrid SF work model with catered meals
  • Professional development stipend
  • Generous equity package in OpenAI
  • Wellness and mental health support
  • Commuter and fitness reimbursements

Why Join OpenAI?

OpenAI is dedicated to safe AGI development for humanity's benefit. The Sora team exemplifies this by expanding video AI capabilities responsibly. Work with pioneers in generative models, access unparalleled compute, and impact billions. Our San Francisco office fosters collaboration in a vibrant AI ecosystem. We value diverse perspectives and offer equal opportunity employment.

How to Apply

Ready to optimize the future of video AI? Submit your resume, GitHub/portfolio, and a note on your favorite optimization project. We're excited to review applications from passionate engineers.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Distributed Systems Engineeringintermediate
  • Machine Learning Optimizationintermediate
  • Python Programmingintermediate
  • Multi-Modal ML Pipelinesintermediate
  • Training Kernel Optimizationintermediate
  • Supercomputer Performance Tuningintermediate
  • AI Model Profilingintermediate
  • Video Generation Modelsintermediate
  • Hardware Efficiency Scalingintermediate
  • Stable Training Dynamicsintermediate
  • PyTorch Proficiencyintermediate
  • JAX Frameworksintermediate
  • GPU Cluster Managementintermediate
  • Parallel Computingintermediate
  • Systems Performance Analysisintermediate
  • Bug-Free ML Codeintermediate
  • Researcher Collaborationintermediate
  • Large-Scale Training Frameworksintermediate
  • Model Architecture Designintermediate
  • Performance Debuggingintermediate

Required Qualifications

  • Proven experience working with multi-modal machine learning pipelines for video and AI models (experience)
  • Deep expertise in distributed systems implementations and performance optimization fundamentals (experience)
  • Strong software engineering skills with proficiency in Python and ML frameworks like PyTorch or JAX (experience)
  • Hands-on experience profiling and optimizing training kernels on GPU/TPU clusters (experience)
  • Passion for understanding and stabilizing training dynamics in large-scale AI models (experience)
  • Track record of designing and implementing state-of-the-art AI training frameworks (experience)
  • Ability to write production-quality, bug-free machine learning code under tight deadlines (experience)
  • Familiarity with supercomputer architectures and hardware efficiency scaling techniques (experience)
  • Experience collaborating with AI researchers to translate ideas into efficient systems (experience)
  • Knowledge of latest techniques in distributed training throughput improvement (experience)
  • Bachelor's or higher degree in Computer Science, Electrical Engineering, or related field (experience)
  • 3+ years of experience in ML systems engineering at scale (experience)

Responsibilities

  • Collaborate closely with Sora researchers to develop systems-efficient video models and novel architectures
  • Apply cutting-edge distributed training techniques to enhance hardware efficiency in internal frameworks
  • Profile and optimize training pipelines to achieve maximum throughput on supercomputer clusters
  • Design and implement improvements to OpenAI's internal training framework for video generation
  • Debug and resolve performance bottlenecks in large-scale ML training runs
  • Experiment with new distributed systems strategies to enable researcher innovation
  • Analyze supercomputer performance metrics and recommend hardware utilization optimizations
  • Write high-quality, maintainable Python code for ML systems with zero-tolerance for bugs
  • Enable reliable deployment of video models into real-world applications
  • Monitor and stabilize training dynamics for multi-modal foundation models
  • Contribute to safety and reliability features in video generation training pipelines
  • Document optimizations and share knowledge across the Sora engineering team
  • Stay updated on latest advancements in AI training hardware and software stacks
  • Participate in code reviews to maintain high engineering standards

Benefits

  • general: Competitive salary with equity in a high-growth AI company
  • general: Comprehensive health, dental, and vision insurance coverage
  • general: Hybrid work model: 3 days in-office per week in San Francisco
  • general: Full relocation assistance for new employees moving to San Francisco
  • general: Generous paid time off and flexible vacation policy
  • general: 401(k) retirement plan with company matching
  • general: Parental leave and family planning benefits
  • general: Fitness reimbursement and wellness programs
  • general: Catered meals and fully stocked kitchens in office
  • general: Learning and development stipend for conferences and courses
  • general: Mental health support through professional counseling
  • general: Commuter benefits and public transit subsidies
  • general: Volunteer time off and charitable donation matching
  • general: Cutting-edge hardware access including latest GPUs and supercomputers

Target Your Resume for "Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Distributed Training Engineer OpenAISora team jobs San FranciscoML systems engineer video AIdistributed systems AI careersPython ML optimization jobssupercomputer training engineervideo generation model engineerOpenAI engineering jobs SFAI training framework developerkernel optimization ML rolesmulti-modal ML pipeline jobsPyTorch distributed trainingJAX video model engineerSan Francisco AI jobs hybridOpenAI Sora careers applyperformance tuning AI supercomputersstable training dynamics expertresearch engineering OpenAIGPU cluster optimization jobsbug-free ML code engineerfoundation model training rolesResearch

Answer 10 quick questions to check your fit for Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Distributed Training Engineer, Sora at OpenAI - San Francisco, CA

Role Overview

Join OpenAI's groundbreaking Sora team as a Distributed Training Engineer in San Francisco, California. The Sora team is at the forefront of revolutionizing video generation through advanced foundation models. As a hybrid research and product team, we push the boundaries of AI capabilities while prioritizing reliability, safety, and real-world deployment. In this critical role, you'll optimize training throughput for our internal frameworks, enabling researchers to experiment boldly with video models that could transform industries like entertainment, education, and beyond.

This position demands a rare blend of systems engineering prowess, ML expertise, and an unrelenting passion for performance optimization. You'll dive deep into supercomputer architectures, craft bug-free ML code, and collaborate directly with world-class researchers. Based in San Francisco with a hybrid model (3 days in-office weekly), OpenAI offers relocation support to bring top talent to our innovative hub. If you thrive on eliminating inefficiencies and scaling AI to unprecedented levels, this is your opportunity to shape the future of generative video AI.

OpenAI's mission to ensure AGI benefits humanity starts with teams like Sora, where engineering excellence meets cutting-edge research. Contribute to models that generate coherent, high-fidelity videos from text prompts, powering applications from creative tools to scientific simulations.

Key Responsibilities

  • Partner with researchers to architect systems-efficient video models, focusing on distributed training scalability.
  • Implement state-of-the-art techniques in OpenAI's internal training framework to maximize hardware utilization.
  • Conduct in-depth profiling of training pipelines, identifying and resolving throughput bottlenecks.
  • Optimize training kernels for GPU/TPU clusters, achieving impressive efficiency gains on supercomputers.
  • Develop robust Python codebases that enable seamless experimentation with novel video architectures.
  • Analyze and stabilize training dynamics to ensure reliable convergence in multi-modal models.
  • Debug complex distributed systems issues in large-scale ML runs, maintaining zero-bug standards.
  • Collaborate on deploying video models safely into production environments for widespread impact.
  • Monitor supercomputer performance metrics and propose hardware-software co-optimizations.
  • Document optimizations and mentor junior engineers on best practices in ML systems.
  • Stay abreast of advancements in parallel computing, JAX/PyTorch ecosystems, and AI hardware.
  • Contribute to safety features that mitigate risks in generative video training pipelines.
  • Participate in cross-team initiatives to integrate Sora capabilities into OpenAI's broader product suite.
  • Drive continuous improvements in training framework maintainability and researcher productivity.

Qualifications

To excel as a Distributed Training Engineer on the Sora team, you'll need:

  • Extensive experience with multi-modal ML pipelines, particularly video and text-to-video systems.
  • Deep systems knowledge for optimizing distributed training at supercomputer scale.
  • Expert-level Python proficiency, with a portfolio of production ML code.
  • Proven success in kernel optimization and performance tuning for AI workloads.
  • A passion for dissecting training dynamics and ensuring numerical stability.
  • Strong collaboration skills with research teams translating ideas to efficient implementations.
  • Familiarity with frameworks like PyTorch Distributed, JAX, or custom training stacks.
  • Experience with GPU cluster management, profiling tools (e.g., NVIDIA Nsight, PyTorch Profiler).
  • Bachelor's/Master's/PhD in CS, EE, or equivalent, plus 3+ years in ML systems engineering.
  • Relentless focus on code quality, performance, and scalability.

Salary & Benefits

Compensation at OpenAI is competitive and includes base salary, equity, and comprehensive benefits. Estimated total compensation for this senior role ranges from $250,000 to $450,000 USD annually, depending on experience. Benefits include health coverage, 401(k) matching, unlimited PTO, parental leave, relocation assistance, and access to state-of-the-art compute resources.

  • Full medical, dental, vision insurance
  • Hybrid SF work model with catered meals
  • Professional development stipend
  • Generous equity package in OpenAI
  • Wellness and mental health support
  • Commuter and fitness reimbursements

Why Join OpenAI?

OpenAI is dedicated to safe AGI development for humanity's benefit. The Sora team exemplifies this by expanding video AI capabilities responsibly. Work with pioneers in generative models, access unparalleled compute, and impact billions. Our San Francisco office fosters collaboration in a vibrant AI ecosystem. We value diverse perspectives and offer equal opportunity employment.

How to Apply

Ready to optimize the future of video AI? Submit your resume, GitHub/portfolio, and a note on your favorite optimization project. We're excited to review applications from passionate engineers.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Distributed Systems Engineeringintermediate
  • Machine Learning Optimizationintermediate
  • Python Programmingintermediate
  • Multi-Modal ML Pipelinesintermediate
  • Training Kernel Optimizationintermediate
  • Supercomputer Performance Tuningintermediate
  • AI Model Profilingintermediate
  • Video Generation Modelsintermediate
  • Hardware Efficiency Scalingintermediate
  • Stable Training Dynamicsintermediate
  • PyTorch Proficiencyintermediate
  • JAX Frameworksintermediate
  • GPU Cluster Managementintermediate
  • Parallel Computingintermediate
  • Systems Performance Analysisintermediate
  • Bug-Free ML Codeintermediate
  • Researcher Collaborationintermediate
  • Large-Scale Training Frameworksintermediate
  • Model Architecture Designintermediate
  • Performance Debuggingintermediate

Required Qualifications

  • Proven experience working with multi-modal machine learning pipelines for video and AI models (experience)
  • Deep expertise in distributed systems implementations and performance optimization fundamentals (experience)
  • Strong software engineering skills with proficiency in Python and ML frameworks like PyTorch or JAX (experience)
  • Hands-on experience profiling and optimizing training kernels on GPU/TPU clusters (experience)
  • Passion for understanding and stabilizing training dynamics in large-scale AI models (experience)
  • Track record of designing and implementing state-of-the-art AI training frameworks (experience)
  • Ability to write production-quality, bug-free machine learning code under tight deadlines (experience)
  • Familiarity with supercomputer architectures and hardware efficiency scaling techniques (experience)
  • Experience collaborating with AI researchers to translate ideas into efficient systems (experience)
  • Knowledge of latest techniques in distributed training throughput improvement (experience)
  • Bachelor's or higher degree in Computer Science, Electrical Engineering, or related field (experience)
  • 3+ years of experience in ML systems engineering at scale (experience)

Responsibilities

  • Collaborate closely with Sora researchers to develop systems-efficient video models and novel architectures
  • Apply cutting-edge distributed training techniques to enhance hardware efficiency in internal frameworks
  • Profile and optimize training pipelines to achieve maximum throughput on supercomputer clusters
  • Design and implement improvements to OpenAI's internal training framework for video generation
  • Debug and resolve performance bottlenecks in large-scale ML training runs
  • Experiment with new distributed systems strategies to enable researcher innovation
  • Analyze supercomputer performance metrics and recommend hardware utilization optimizations
  • Write high-quality, maintainable Python code for ML systems with zero-tolerance for bugs
  • Enable reliable deployment of video models into real-world applications
  • Monitor and stabilize training dynamics for multi-modal foundation models
  • Contribute to safety and reliability features in video generation training pipelines
  • Document optimizations and share knowledge across the Sora engineering team
  • Stay updated on latest advancements in AI training hardware and software stacks
  • Participate in code reviews to maintain high engineering standards

Benefits

  • general: Competitive salary with equity in a high-growth AI company
  • general: Comprehensive health, dental, and vision insurance coverage
  • general: Hybrid work model: 3 days in-office per week in San Francisco
  • general: Full relocation assistance for new employees moving to San Francisco
  • general: Generous paid time off and flexible vacation policy
  • general: 401(k) retirement plan with company matching
  • general: Parental leave and family planning benefits
  • general: Fitness reimbursement and wellness programs
  • general: Catered meals and fully stocked kitchens in office
  • general: Learning and development stipend for conferences and courses
  • general: Mental health support through professional counseling
  • general: Commuter benefits and public transit subsidies
  • general: Volunteer time off and charitable donation matching
  • general: Cutting-edge hardware access including latest GPUs and supercomputers

Target Your Resume for "Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Distributed Training Engineer OpenAISora team jobs San FranciscoML systems engineer video AIdistributed systems AI careersPython ML optimization jobssupercomputer training engineervideo generation model engineerOpenAI engineering jobs SFAI training framework developerkernel optimization ML rolesmulti-modal ML pipeline jobsPyTorch distributed trainingJAX video model engineerSan Francisco AI jobs hybridOpenAI Sora careers applyperformance tuning AI supercomputersstable training dynamics expertresearch engineering OpenAIGPU cluster optimization jobsbug-free ML code engineerfoundation model training rolesResearch

Answer 10 quick questions to check your fit for Distributed Training Engineer, Sora Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.