Resume and JobRESUME AND JOB
OpenAI logo

Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Model Inference at OpenAI - San Francisco, CA

Join OpenAI's elite Inference team and become a key architect behind the world's most powerful AI models. We're seeking exceptional Software Engineers to optimize massive-scale model inference systems that power ChatGPT, DALL-E, and groundbreaking research. This senior-level role offers unparalleled impact on AI deployment at global scale.

Role Overview

OpenAI's Inference team is at the forefront of AI deployment, bridging cutting-edge research with production reality. You'll work with the largest language models on Earth, optimizing them for high-volume, low-latency, high-availability production environments. Every optimization you create directly impacts millions of users and accelerates humanity's AI progress.

Based in San Francisco, this role demands deep expertise in ML systems, GPU optimization, and distributed computing. You'll collaborate with top ML researchers, systems engineers, and product managers to push the boundaries of what's possible with production AI inference.

Key challenges include maximizing every FLOP of GPU compute, minimizing inference latency to sub-second levels, and scaling systems to handle billions of tokens daily while maintaining 99.99% uptime.

Key Responsibilities

  1. Partner directly with OpenAI researchers to productionize frontier AI models within weeks of research breakthroughs
  2. Design novel inference optimization techniques that reduce latency by 50%+ and increase throughput 10x
  3. Build comprehensive observability platforms revealing every bottleneck in our trillion-parameter model stacks
  4. Architect distributed inference systems spanning thousands of NVIDIA H100 GPUs across Azure data centers
  5. Deep-dive debug production incidents affecting millions of users, implementing fixes under extreme time pressure
  6. Optimize PyTorch models using advanced quantization, pruning, and kernel fusion techniques
  7. Tune NCCL, NVLink, and InfiniBand communication for maximum distributed training/inference bandwidth
  8. Develop auto-scaling infrastructure that dynamically provisions GPU resources based on global demand
  9. Create research tooling that accelerates model development cycles by 3x through better inference profiling
  10. Maintain mission-critical 24/7 inference services powering OpenAI's consumer and enterprise products
  11. Lead cross-team efforts to refactor aging systems for 100x scale growth
  12. Mentor junior engineers while shipping production code weekly
  13. Contribute to open-source inference tooling benefiting the global ML community

Qualifications

Must-Have Technical Expertise:

  • 5+ years production software engineering with 2+ years ML systems experience
  • Deep PyTorch expertise including custom C++/CUDA kernels
  • Hands-on NVIDIA GPU optimization (A100/H100) with TensorRT, Triton
  • Distributed systems experience at massive scale (10k+ GPU clusters)
  • HPC networking mastery (InfiniBand 400Gbps+, NVLink 900GB/s)

Proven Track Record:

  • Rebuilt production systems multiple times due to 100x+ scale growth
  • Performance engineering delivering 5x+ speedups in production
  • Owned complete ML inference stacks from model optimization to serving

Success Attributes:

  • Thrives in extreme ambiguity with rapidly evolving requirements
  • End-to-end ownership mentality - follows problems from root cause to monitoring
  • Humble team player who elevates colleagues while shipping aggressively

Salary & Benefits

Competitive Compensation: Total compensation $280K-$420K+ (base, equity, bonus). Exact level based on experience.

Exceptional Benefits Package:

  • Top-tier medical/dental/vision with $0 premiums
  • Unlimited PTO + 18 weeks parental leave
  • Generous 401k match + immediate vesting
  • Significant equity in the leading AI company
  • Weekly catered meals + fully stocked kitchens
  • Learning stipend + conference budget
  • Onsite gym + mental health support
  • Relocation package for Bay Area move

Why Join OpenAI?

OpenAI isn't just another tech company - we're building artificial general intelligence to benefit all humanity. Your optimizations will power breakthroughs that reshape every industry.

Unmatched Impact: Every line of code affects hundreds of millions of users weekly. No other engineering role offers this scale of real-world impact.

Technical Excellence: Work with the absolute latest - H100 GPU clusters, trillion-parameter models, bleeding-edge research papers implemented day zero.

Career Acceleration: Learn from the world's best ML researchers and systems engineers. Your work becomes industry standard.

Culture of Ownership: Radical ownership with minimal bureaucracy. You own mission-critical systems end-to-end.

How to Apply

Ready to optimize the world's most important AI models? Submit your resume and a brief note explaining your most impressive inference optimization achievement. We're looking for engineers who've shipped systems at our scale.

Application Process:

  1. 30-minute recruiter screen
  2. Technical deep-dive with engineering manager
  3. Live coding + systems design
  4. Researcher pairing session
  5. Team interviews + leadership chat

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

294,000 - 462,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • PyTorchintermediate
  • CUDAintermediate
  • NCCLintermediate
  • NVLinkintermediate
  • InfiniBandintermediate
  • MPIintermediate
  • GPU Optimizationintermediate
  • Model Inferenceintermediate
  • Distributed Systemsintermediate
  • High-Performance Computing (HPC)intermediate
  • Azure VMsintermediate
  • Machine Learning Engineeringintermediate
  • Low-Latency Systemsintermediate
  • Performance Profilingintermediate
  • Production Debuggingintermediate
  • AI Model Deploymentintermediate
  • TensorRTintermediate
  • Deep Learning Optimizationintermediate
  • Scalable Inferenceintermediate
  • FLOP Optimizationintermediate

Required Qualifications

  • 5+ years of professional software engineering experience (experience)
  • Deep understanding of modern ML architectures and inference optimization (experience)
  • Familiarity with PyTorch, NVIDIA GPUs, CUDA, NCCL (experience)
  • Experience with HPC technologies including InfiniBand, MPI, NVLink (experience)
  • Proven track record architecting production distributed systems (experience)
  • Hands-on experience with performance-critical distributed systems (experience)
  • History of rebuilding/refactoring systems at massive scale (experience)
  • Ability to own problems end-to-end and rapidly learn new technologies (experience)
  • Strong debugging skills for production environments (experience)
  • Self-directed with excellent problem prioritization (experience)
  • Collaborative mindset with humble, team-oriented attitude (experience)
  • Experience optimizing GPU utilization (FLOP and memory efficiency) (experience)

Responsibilities

  • Collaborate with ML researchers to productionize cutting-edge AI models
  • Partner with researchers to enable advanced AI research through engineering
  • Develop new techniques to improve model inference performance and latency
  • Build observability tools to identify inference bottlenecks and instability
  • Design and implement solutions for highest-priority performance issues
  • Optimize PyTorch code and Azure VM fleets for maximum GPU utilization
  • Architect scalable inference systems handling high-volume traffic
  • Implement low-latency inference pipelines for real-time applications
  • Maintain high-availability production environments for AI services
  • Profile and tune NCCL, CUDA, and other GPU communication stacks
  • Scale inference infrastructure to support rapidly growing model demands
  • Debug complex distributed system failures under production load
  • Contribute to research acceleration through optimized inference tooling
  • Monitor and improve throughput, efficiency, and cost of inference operations

Benefits

  • general: Comprehensive medical, dental, and vision insurance
  • general: 401(k) plan with generous company matching
  • general: Unlimited PTO with encouragement to recharge
  • general: Mental health support through dedicated programs
  • general: Fertility assistance and family planning benefits
  • general: Paid parental leave for primary and secondary caregivers
  • general: Stock options in a rapidly growing AI leader
  • general: Annual learning stipend for professional development
  • general: Weekly team lunches and catered meals
  • general: Onsite gym membership and wellness programs
  • general: Relocation assistance for SF Bay Area move
  • general: Cutting-edge hardware access (latest NVIDIA GPUs)
  • general: Flexible work hours with focus on results
  • general: Direct impact on world-changing AI products

Target Your Resume for "Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

OpenAI software engineer jobsmodel inference engineer OpenAIAI inference optimization careersPyTorch GPU optimization jobsdistributed ML systems engineerNVIDIA CUDA engineer San FranciscoHPC InfiniBand jobs OpenAIproduction AI model deploymentlow latency inference engineertrillion parameter model optimizationOpenAI inference team careerssenior ML systems engineerAzure GPU fleet optimizationNVLink NVSwitch engineer jobsperformance critical distributed systemsSan Francisco AI engineering jobsOpenAI PyTorch optimization rolesproduction ML inference architectH100 GPU optimization careersscalable AI inference systemsOpenAI research engineering jobsmachine learning production engineerScaling

Answer 10 quick questions to check your fit for Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Model Inference at OpenAI - San Francisco, CA

Join OpenAI's elite Inference team and become a key architect behind the world's most powerful AI models. We're seeking exceptional Software Engineers to optimize massive-scale model inference systems that power ChatGPT, DALL-E, and groundbreaking research. This senior-level role offers unparalleled impact on AI deployment at global scale.

Role Overview

OpenAI's Inference team is at the forefront of AI deployment, bridging cutting-edge research with production reality. You'll work with the largest language models on Earth, optimizing them for high-volume, low-latency, high-availability production environments. Every optimization you create directly impacts millions of users and accelerates humanity's AI progress.

Based in San Francisco, this role demands deep expertise in ML systems, GPU optimization, and distributed computing. You'll collaborate with top ML researchers, systems engineers, and product managers to push the boundaries of what's possible with production AI inference.

Key challenges include maximizing every FLOP of GPU compute, minimizing inference latency to sub-second levels, and scaling systems to handle billions of tokens daily while maintaining 99.99% uptime.

Key Responsibilities

  1. Partner directly with OpenAI researchers to productionize frontier AI models within weeks of research breakthroughs
  2. Design novel inference optimization techniques that reduce latency by 50%+ and increase throughput 10x
  3. Build comprehensive observability platforms revealing every bottleneck in our trillion-parameter model stacks
  4. Architect distributed inference systems spanning thousands of NVIDIA H100 GPUs across Azure data centers
  5. Deep-dive debug production incidents affecting millions of users, implementing fixes under extreme time pressure
  6. Optimize PyTorch models using advanced quantization, pruning, and kernel fusion techniques
  7. Tune NCCL, NVLink, and InfiniBand communication for maximum distributed training/inference bandwidth
  8. Develop auto-scaling infrastructure that dynamically provisions GPU resources based on global demand
  9. Create research tooling that accelerates model development cycles by 3x through better inference profiling
  10. Maintain mission-critical 24/7 inference services powering OpenAI's consumer and enterprise products
  11. Lead cross-team efforts to refactor aging systems for 100x scale growth
  12. Mentor junior engineers while shipping production code weekly
  13. Contribute to open-source inference tooling benefiting the global ML community

Qualifications

Must-Have Technical Expertise:

  • 5+ years production software engineering with 2+ years ML systems experience
  • Deep PyTorch expertise including custom C++/CUDA kernels
  • Hands-on NVIDIA GPU optimization (A100/H100) with TensorRT, Triton
  • Distributed systems experience at massive scale (10k+ GPU clusters)
  • HPC networking mastery (InfiniBand 400Gbps+, NVLink 900GB/s)

Proven Track Record:

  • Rebuilt production systems multiple times due to 100x+ scale growth
  • Performance engineering delivering 5x+ speedups in production
  • Owned complete ML inference stacks from model optimization to serving

Success Attributes:

  • Thrives in extreme ambiguity with rapidly evolving requirements
  • End-to-end ownership mentality - follows problems from root cause to monitoring
  • Humble team player who elevates colleagues while shipping aggressively

Salary & Benefits

Competitive Compensation: Total compensation $280K-$420K+ (base, equity, bonus). Exact level based on experience.

Exceptional Benefits Package:

  • Top-tier medical/dental/vision with $0 premiums
  • Unlimited PTO + 18 weeks parental leave
  • Generous 401k match + immediate vesting
  • Significant equity in the leading AI company
  • Weekly catered meals + fully stocked kitchens
  • Learning stipend + conference budget
  • Onsite gym + mental health support
  • Relocation package for Bay Area move

Why Join OpenAI?

OpenAI isn't just another tech company - we're building artificial general intelligence to benefit all humanity. Your optimizations will power breakthroughs that reshape every industry.

Unmatched Impact: Every line of code affects hundreds of millions of users weekly. No other engineering role offers this scale of real-world impact.

Technical Excellence: Work with the absolute latest - H100 GPU clusters, trillion-parameter models, bleeding-edge research papers implemented day zero.

Career Acceleration: Learn from the world's best ML researchers and systems engineers. Your work becomes industry standard.

Culture of Ownership: Radical ownership with minimal bureaucracy. You own mission-critical systems end-to-end.

How to Apply

Ready to optimize the world's most important AI models? Submit your resume and a brief note explaining your most impressive inference optimization achievement. We're looking for engineers who've shipped systems at our scale.

Application Process:

  1. 30-minute recruiter screen
  2. Technical deep-dive with engineering manager
  3. Live coding + systems design
  4. Researcher pairing session
  5. Team interviews + leadership chat

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

294,000 - 462,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • PyTorchintermediate
  • CUDAintermediate
  • NCCLintermediate
  • NVLinkintermediate
  • InfiniBandintermediate
  • MPIintermediate
  • GPU Optimizationintermediate
  • Model Inferenceintermediate
  • Distributed Systemsintermediate
  • High-Performance Computing (HPC)intermediate
  • Azure VMsintermediate
  • Machine Learning Engineeringintermediate
  • Low-Latency Systemsintermediate
  • Performance Profilingintermediate
  • Production Debuggingintermediate
  • AI Model Deploymentintermediate
  • TensorRTintermediate
  • Deep Learning Optimizationintermediate
  • Scalable Inferenceintermediate
  • FLOP Optimizationintermediate

Required Qualifications

  • 5+ years of professional software engineering experience (experience)
  • Deep understanding of modern ML architectures and inference optimization (experience)
  • Familiarity with PyTorch, NVIDIA GPUs, CUDA, NCCL (experience)
  • Experience with HPC technologies including InfiniBand, MPI, NVLink (experience)
  • Proven track record architecting production distributed systems (experience)
  • Hands-on experience with performance-critical distributed systems (experience)
  • History of rebuilding/refactoring systems at massive scale (experience)
  • Ability to own problems end-to-end and rapidly learn new technologies (experience)
  • Strong debugging skills for production environments (experience)
  • Self-directed with excellent problem prioritization (experience)
  • Collaborative mindset with humble, team-oriented attitude (experience)
  • Experience optimizing GPU utilization (FLOP and memory efficiency) (experience)

Responsibilities

  • Collaborate with ML researchers to productionize cutting-edge AI models
  • Partner with researchers to enable advanced AI research through engineering
  • Develop new techniques to improve model inference performance and latency
  • Build observability tools to identify inference bottlenecks and instability
  • Design and implement solutions for highest-priority performance issues
  • Optimize PyTorch code and Azure VM fleets for maximum GPU utilization
  • Architect scalable inference systems handling high-volume traffic
  • Implement low-latency inference pipelines for real-time applications
  • Maintain high-availability production environments for AI services
  • Profile and tune NCCL, CUDA, and other GPU communication stacks
  • Scale inference infrastructure to support rapidly growing model demands
  • Debug complex distributed system failures under production load
  • Contribute to research acceleration through optimized inference tooling
  • Monitor and improve throughput, efficiency, and cost of inference operations

Benefits

  • general: Comprehensive medical, dental, and vision insurance
  • general: 401(k) plan with generous company matching
  • general: Unlimited PTO with encouragement to recharge
  • general: Mental health support through dedicated programs
  • general: Fertility assistance and family planning benefits
  • general: Paid parental leave for primary and secondary caregivers
  • general: Stock options in a rapidly growing AI leader
  • general: Annual learning stipend for professional development
  • general: Weekly team lunches and catered meals
  • general: Onsite gym membership and wellness programs
  • general: Relocation assistance for SF Bay Area move
  • general: Cutting-edge hardware access (latest NVIDIA GPUs)
  • general: Flexible work hours with focus on results
  • general: Direct impact on world-changing AI products

Target Your Resume for "Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

OpenAI software engineer jobsmodel inference engineer OpenAIAI inference optimization careersPyTorch GPU optimization jobsdistributed ML systems engineerNVIDIA CUDA engineer San FranciscoHPC InfiniBand jobs OpenAIproduction AI model deploymentlow latency inference engineertrillion parameter model optimizationOpenAI inference team careerssenior ML systems engineerAzure GPU fleet optimizationNVLink NVSwitch engineer jobsperformance critical distributed systemsSan Francisco AI engineering jobsOpenAI PyTorch optimization rolesproduction ML inference architectH100 GPU optimization careersscalable AI inference systemsOpenAI research engineering jobsmachine learning production engineerScaling

Answer 10 quick questions to check your fit for Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.