RESUME AND JOB

Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Model Inference at OpenAI - San Francisco, CA

Join OpenAI's elite Inference team and become a key architect behind the world's most powerful AI models. We're seeking exceptional Software Engineers to optimize massive-scale model inference systems that power ChatGPT, DALL-E, and groundbreaking research. This senior-level role offers unparalleled impact on AI deployment at global scale.

Role Overview

OpenAI's Inference team is at the forefront of AI deployment, bridging cutting-edge research with production reality. You'll work with the largest language models on Earth, optimizing them for high-volume, low-latency, high-availability production environments. Every optimization you create directly impacts millions of users and accelerates humanity's AI progress.

Based in San Francisco, this role demands deep expertise in ML systems, GPU optimization, and distributed computing. You'll collaborate with top ML researchers, systems engineers, and product managers to push the boundaries of what's possible with production AI inference.

Key challenges include maximizing every FLOP of GPU compute, minimizing inference latency to sub-second levels, and scaling systems to handle billions of tokens daily while maintaining 99.99% uptime.

Key Responsibilities

Partner directly with OpenAI researchers to productionize frontier AI models within weeks of research breakthroughs
Design novel inference optimization techniques that reduce latency by 50%+ and increase throughput 10x
Build comprehensive observability platforms revealing every bottleneck in our trillion-parameter model stacks
Architect distributed inference systems spanning thousands of NVIDIA H100 GPUs across Azure data centers
Deep-dive debug production incidents affecting millions of users, implementing fixes under extreme time pressure
Optimize PyTorch models using advanced quantization, pruning, and kernel fusion techniques
Tune NCCL, NVLink, and InfiniBand communication for maximum distributed training/inference bandwidth
Develop auto-scaling infrastructure that dynamically provisions GPU resources based on global demand
Create research tooling that accelerates model development cycles by 3x through better inference profiling
Maintain mission-critical 24/7 inference services powering OpenAI's consumer and enterprise products
Lead cross-team efforts to refactor aging systems for 100x scale growth
Mentor junior engineers while shipping production code weekly
Contribute to open-source inference tooling benefiting the global ML community

Qualifications

Must-Have Technical Expertise:

5+ years production software engineering with 2+ years ML systems experience
Deep PyTorch expertise including custom C++/CUDA kernels
Hands-on NVIDIA GPU optimization (A100/H100) with TensorRT, Triton
Distributed systems experience at massive scale (10k+ GPU clusters)
HPC networking mastery (InfiniBand 400Gbps+, NVLink 900GB/s)

Proven Track Record:

Rebuilt production systems multiple times due to 100x+ scale growth
Performance engineering delivering 5x+ speedups in production
Owned complete ML inference stacks from model optimization to serving

Success Attributes:

Thrives in extreme ambiguity with rapidly evolving requirements
End-to-end ownership mentality - follows problems from root cause to monitoring
Humble team player who elevates colleagues while shipping aggressively

Salary & Benefits

Competitive Compensation: Total compensation $280K-$420K+ (base, equity, bonus). Exact level based on experience.

Exceptional Benefits Package:

Top-tier medical/dental/vision with $0 premiums
Unlimited PTO + 18 weeks parental leave
Generous 401k match + immediate vesting
Significant equity in the leading AI company
Weekly catered meals + fully stocked kitchens
Learning stipend + conference budget
Onsite gym + mental health support
Relocation package for Bay Area move

Why Join OpenAI?

OpenAI isn't just another tech company - we're building artificial general intelligence to benefit all humanity. Your optimizations will power breakthroughs that reshape every industry.

Unmatched Impact: Every line of code affects hundreds of millions of users weekly. No other engineering role offers this scale of real-world impact.

Technical Excellence: Work with the absolute latest - H100 GPU clusters, trillion-parameter models, bleeding-edge research papers implemented day zero.

Career Acceleration: Learn from the world's best ML researchers and systems engineers. Your work becomes industry standard.

Culture of Ownership: Radical ownership with minimal bureaucracy. You own mission-critical systems end-to-end.

How to Apply

Ready to optimize the world's most important AI models? Submit your resume and a brief note explaining your most impressive inference optimization achievement. We're looking for engineers who've shipped systems at our scale.

Application Process:

30-minute recruiter screen
Technical deep-dive with engineering manager
Live coding + systems design
Researcher pairing session
Team interviews + leadership chat

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

294,000 - 462,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

PyTorchintermediate
CUDAintermediate
NCCLintermediate
NVLinkintermediate
InfiniBandintermediate
MPIintermediate
GPU Optimizationintermediate
Model Inferenceintermediate
Distributed Systemsintermediate
High-Performance Computing (HPC)intermediate
Azure VMsintermediate
Machine Learning Engineeringintermediate
Low-Latency Systemsintermediate
Performance Profilingintermediate
Production Debuggingintermediate
AI Model Deploymentintermediate
TensorRTintermediate
Deep Learning Optimizationintermediate
Scalable Inferenceintermediate
FLOP Optimizationintermediate

Required Qualifications

5+ years of professional software engineering experience (experience)
Deep understanding of modern ML architectures and inference optimization (experience)
Familiarity with PyTorch, NVIDIA GPUs, CUDA, NCCL (experience)
Experience with HPC technologies including InfiniBand, MPI, NVLink (experience)
Proven track record architecting production distributed systems (experience)
Hands-on experience with performance-critical distributed systems (experience)
History of rebuilding/refactoring systems at massive scale (experience)
Ability to own problems end-to-end and rapidly learn new technologies (experience)
Strong debugging skills for production environments (experience)
Self-directed with excellent problem prioritization (experience)
Collaborative mindset with humble, team-oriented attitude (experience)
Experience optimizing GPU utilization (FLOP and memory efficiency) (experience)

Responsibilities

Collaborate with ML researchers to productionize cutting-edge AI models
Partner with researchers to enable advanced AI research through engineering
Develop new techniques to improve model inference performance and latency
Build observability tools to identify inference bottlenecks and instability
Design and implement solutions for highest-priority performance issues
Optimize PyTorch code and Azure VM fleets for maximum GPU utilization
Architect scalable inference systems handling high-volume traffic
Implement low-latency inference pipelines for real-time applications
Maintain high-availability production environments for AI services
Profile and tune NCCL, CUDA, and other GPU communication stacks
Scale inference infrastructure to support rapidly growing model demands
Debug complex distributed system failures under production load
Contribute to research acceleration through optimized inference tooling
Monitor and improve throughput, efficiency, and cost of inference operations

Benefits

general: Comprehensive medical, dental, and vision insurance
general: 401(k) plan with generous company matching
general: Unlimited PTO with encouragement to recharge
general: Mental health support through dedicated programs
general: Fertility assistance and family planning benefits
general: Paid parental leave for primary and secondary caregivers
general: Stock options in a rapidly growing AI leader
general: Annual learning stipend for professional development
general: Weekly team lunches and catered meals
general: Onsite gym membership and wellness programs
general: Relocation assistance for SF Bay Area move
general: Cutting-edge hardware access (latest NVIDIA GPUs)
general: Flexible work hours with focus on results
general: Direct impact on world-changing AI products

Target Your Resume for "Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

OpenAI software engineer jobsmodel inference engineer OpenAIAI inference optimization careersPyTorch GPU optimization jobsdistributed ML systems engineerNVIDIA CUDA engineer San FranciscoHPC InfiniBand jobs OpenAIproduction AI model deploymentlow latency inference engineertrillion parameter model optimizationOpenAI inference team careerssenior ML systems engineerAzure GPU fleet optimizationNVLink NVSwitch engineer jobsperformance critical distributed systemsSan Francisco AI engineering jobsOpenAI PyTorch optimization rolesproduction ML inference architectH100 GPU optimization careersscalable AI inference systemsOpenAI research engineering jobsmachine learning production engineerScaling

Answer 10 quick questions to check your fit for Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Model Inference at OpenAI - San Francisco, CA

Role Overview

Key challenges include maximizing every FLOP of GPU compute, minimizing inference latency to sub-second levels, and scaling systems to handle billions of tokens daily while maintaining 99.99% uptime.

Key Responsibilities

Partner directly with OpenAI researchers to productionize frontier AI models within weeks of research breakthroughs
Design novel inference optimization techniques that reduce latency by 50%+ and increase throughput 10x
Build comprehensive observability platforms revealing every bottleneck in our trillion-parameter model stacks
Architect distributed inference systems spanning thousands of NVIDIA H100 GPUs across Azure data centers
Deep-dive debug production incidents affecting millions of users, implementing fixes under extreme time pressure
Optimize PyTorch models using advanced quantization, pruning, and kernel fusion techniques
Tune NCCL, NVLink, and InfiniBand communication for maximum distributed training/inference bandwidth
Develop auto-scaling infrastructure that dynamically provisions GPU resources based on global demand
Create research tooling that accelerates model development cycles by 3x through better inference profiling
Maintain mission-critical 24/7 inference services powering OpenAI's consumer and enterprise products
Lead cross-team efforts to refactor aging systems for 100x scale growth
Mentor junior engineers while shipping production code weekly
Contribute to open-source inference tooling benefiting the global ML community

Qualifications

Must-Have Technical Expertise:

5+ years production software engineering with 2+ years ML systems experience
Deep PyTorch expertise including custom C++/CUDA kernels
Hands-on NVIDIA GPU optimization (A100/H100) with TensorRT, Triton
Distributed systems experience at massive scale (10k+ GPU clusters)
HPC networking mastery (InfiniBand 400Gbps+, NVLink 900GB/s)

Proven Track Record:

Rebuilt production systems multiple times due to 100x+ scale growth
Performance engineering delivering 5x+ speedups in production
Owned complete ML inference stacks from model optimization to serving

Success Attributes:

Thrives in extreme ambiguity with rapidly evolving requirements
End-to-end ownership mentality - follows problems from root cause to monitoring
Humble team player who elevates colleagues while shipping aggressively

Salary & Benefits

Competitive Compensation: Total compensation $280K-$420K+ (base, equity, bonus). Exact level based on experience.

Exceptional Benefits Package:

Top-tier medical/dental/vision with $0 premiums
Unlimited PTO + 18 weeks parental leave
Generous 401k match + immediate vesting
Significant equity in the leading AI company
Weekly catered meals + fully stocked kitchens
Learning stipend + conference budget
Onsite gym + mental health support
Relocation package for Bay Area move

Why Join OpenAI?

OpenAI isn't just another tech company - we're building artificial general intelligence to benefit all humanity. Your optimizations will power breakthroughs that reshape every industry.

Unmatched Impact: Every line of code affects hundreds of millions of users weekly. No other engineering role offers this scale of real-world impact.

Technical Excellence: Work with the absolute latest - H100 GPU clusters, trillion-parameter models, bleeding-edge research papers implemented day zero.

Career Acceleration: Learn from the world's best ML researchers and systems engineers. Your work becomes industry standard.

Culture of Ownership: Radical ownership with minimal bureaucracy. You own mission-critical systems end-to-end.

How to Apply

Application Process:

30-minute recruiter screen
Technical deep-dive with engineering manager
Live coding + systems design
Researcher pairing session
Team interviews + leadership chat

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

294,000 - 462,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

PyTorchintermediate
CUDAintermediate
NCCLintermediate
NVLinkintermediate
InfiniBandintermediate
MPIintermediate
GPU Optimizationintermediate
Model Inferenceintermediate
Distributed Systemsintermediate
High-Performance Computing (HPC)intermediate
Azure VMsintermediate
Machine Learning Engineeringintermediate
Low-Latency Systemsintermediate
Performance Profilingintermediate
Production Debuggingintermediate
AI Model Deploymentintermediate
TensorRTintermediate
Deep Learning Optimizationintermediate
Scalable Inferenceintermediate
FLOP Optimizationintermediate

Required Qualifications

5+ years of professional software engineering experience (experience)
Deep understanding of modern ML architectures and inference optimization (experience)
Familiarity with PyTorch, NVIDIA GPUs, CUDA, NCCL (experience)
Experience with HPC technologies including InfiniBand, MPI, NVLink (experience)
Proven track record architecting production distributed systems (experience)
Hands-on experience with performance-critical distributed systems (experience)
History of rebuilding/refactoring systems at massive scale (experience)
Ability to own problems end-to-end and rapidly learn new technologies (experience)
Strong debugging skills for production environments (experience)
Self-directed with excellent problem prioritization (experience)
Collaborative mindset with humble, team-oriented attitude (experience)
Experience optimizing GPU utilization (FLOP and memory efficiency) (experience)

Responsibilities

Collaborate with ML researchers to productionize cutting-edge AI models
Partner with researchers to enable advanced AI research through engineering
Develop new techniques to improve model inference performance and latency
Build observability tools to identify inference bottlenecks and instability
Design and implement solutions for highest-priority performance issues
Optimize PyTorch code and Azure VM fleets for maximum GPU utilization
Architect scalable inference systems handling high-volume traffic
Implement low-latency inference pipelines for real-time applications
Maintain high-availability production environments for AI services
Profile and tune NCCL, CUDA, and other GPU communication stacks
Scale inference infrastructure to support rapidly growing model demands
Debug complex distributed system failures under production load
Contribute to research acceleration through optimized inference tooling
Monitor and improve throughput, efficiency, and cost of inference operations

Benefits

general: Comprehensive medical, dental, and vision insurance
general: 401(k) plan with generous company matching
general: Unlimited PTO with encouragement to recharge
general: Mental health support through dedicated programs
general: Fertility assistance and family planning benefits
general: Paid parental leave for primary and secondary caregivers
general: Stock options in a rapidly growing AI leader
general: Annual learning stipend for professional development
general: Weekly team lunches and catered meals
general: Onsite gym membership and wellness programs
general: Relocation assistance for SF Bay Area move
general: Cutting-edge hardware access (latest NVIDIA GPUs)
general: Flexible work hours with focus on results
general: Direct impact on world-changing AI products

Target Your Resume for "Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Software Engineer, Model Inference Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap