RESUME AND JOB

Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Inference Technical Lead, Sora at OpenAI - San Francisco, CA

Join OpenAI's pioneering Sora team as an Inference Technical Lead and drive the future of multimodal AI inference. This senior-level role in San Francisco offers a unique opportunity to optimize GPU inference for cutting-edge video generation models. If you have deep expertise in kernel-level optimization and scalable AI systems, apply now to shape the next generation of AI products.

Role Overview

The Inference Technical Lead position on the Sora team at OpenAI represents a critical leadership role in advancing multimodal AI capabilities. Sora, OpenAI's groundbreaking text-to-video model, requires world-class inference infrastructure to deliver reliable, high-performance experiences to millions of users worldwide. As the technical lead for inference engineering, you'll spearhead initiatives that directly impact model serving efficiency, system scalability, and overall product reliability.

This hybrid position based in San Francisco combines deep systems engineering with AI research collaboration. You'll work alongside world-class researchers to design inference-friendly architectures while building production-grade serving infrastructure. Your optimizations will enable leadership to focus on higher-leverage strategic initiatives, making this role foundational to OpenAI's scaling objectives.

Key focus areas include GPU kernel optimization, data movement efficiency, low-level performance tuning, and distributed inference systems. Success in this role requires both breadth (understanding full-stack AI infrastructure) and depth (mastery of GPU programming and inference optimization techniques). You'll thrive if you excel at navigating technical ambiguity, driving complex projects to completion, and communicating effectively across research, engineering, and product teams.

OpenAI's mission-driven culture emphasizes safety, reliability, and broad societal benefit. Your work will directly contribute to safely deploying powerful multimodal capabilities while maintaining the highest standards of performance and user experience.

Key Responsibilities

As Inference Technical Lead, you'll own the end-to-end optimization of Sora's inference pipeline. Core responsibilities include:

Lead comprehensive engineering efforts to enhance model serving efficiency and inference performance across GPU clusters
Develop and optimize custom CUDA kernels to maximize throughput for multimodal workloads
Analyze and eliminate data movement bottlenecks between GPU memory hierarchies
Design scalable serving infrastructure supporting millions of daily inference requests
Collaborate with researchers to influence model architecture decisions for optimal inference characteristics
Implement advanced optimization techniques including quantization, pruning, and knowledge distillation
Build monitoring and alerting systems for production inference reliability
Profile complex inference workloads using NVIDIA tools (Nsight, DCGM) and custom instrumentation
Mentor engineers on GPU optimization best practices and inference system design
Drive technical roadmaps for next-generation inference infrastructure
Partner with product teams to define SLOs and performance targets for user-facing features
Conduct A/B experiments comparing optimization strategies across diverse workloads
Contribute to open-source inference optimization tools when appropriate

Expect to spend ~40% of time on hands-on optimization, 30% infrastructure design, 20% cross-team collaboration, and 10% strategic planning.

Qualifications

We're seeking candidates with exceptional depth in GPU inference optimization and proven leadership in production AI systems:

7+ years experience in systems programming with 4+ years focused on AI/ML inference optimization
Deep expertise in CUDA programming, GPU kernel development, and low-level optimization
Proven track record shipping high-throughput inference systems serving real-world workloads
Strong understanding of GPU memory hierarchies, data movement, and bandwidth optimization
Experience with TensorRT, ONNX Runtime, or TVM for optimized model deployment
Familiarity with PyTorch/TensorFlow deployment patterns and serving frameworks (Triton, KServe)
Demonstrated ability to lead complex technical projects through ambiguity to successful completion
Experience collaborating with AI researchers to optimize model architectures for inference
Proficiency with performance profiling tools (perf, VTune, Nsight Systems/Compute)
Strong communication skills for cross-functional technical leadership

Bonus: Experience with multimodal models, video processing pipelines, or large-scale distributed training/inference systems.

Salary & Benefits

Compensation Range: $350,000 - $550,000 USD base salary + equity + benefits (total compensation depends on experience and location).

OpenAI offers one of the most comprehensive benefits packages in tech:

Industry-leading medical, dental, vision coverage with 100% premiums covered
Hybrid SF work model (3 days/week in office) with relocation support
Unlimited PTO with manager approval and generous parental leave
401(k) matching and employee stock purchase opportunities
Professional development budget ($10K+/year) for conferences/courses
Wellness benefits including mental health support and gym reimbursement
Daily catered meals, snacks, and comprehensive commuter benefits
Volunteer time off and charitable donation matching programs

Why Join OpenAI?

OpenAI represents humanity's best hope for safely harnessing artificial general intelligence. The Sora team specifically pioneers multimodal capabilities that will redefine creative workflows worldwide. As Inference Technical Lead, you'll:

Work on bleeding-edge video generation models serving millions of users
Directly impact OpenAI's ability to scale safely and reliably
Collaborate with Turing Award winners and leading AI researchers daily
Shape the commercial deployment of foundation models at global scale
Enjoy San Francisco's vibrant tech ecosystem with hybrid flexibility

OpenAI's culture emphasizes impact, ownership, and rapid iteration. We hire exceptional people and give them outsized responsibility from day one.

How to Apply

Ready to optimize the future of AI inference? Submit your resume, LinkedIn, and a brief note explaining:

Your most impactful inference optimization project (quantitative results required)
Experience with GPU kernel development or low-level optimization
Why you're excited about multimodal AI and OpenAI's mission

Application deadline: Rolling until filled. US work authorization required.

Join us in San Francisco to build inference infrastructure that powers the next era of AI creativity!

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

367,500 - 605,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

GPU Inference Optimizationintermediate
Model Serving Efficiencyintermediate
Kernel-Level Programmingintermediate
Data Movement Optimizationintermediate
Low-Level Performance Tuningintermediate
CUDA Programmingintermediate
PyTorch Optimizationintermediate
TensorRT Deploymentintermediate
Distributed Systems Scalingintermediate
Multimodal AI Systemsintermediate
Inference Latency Reductionintermediate
Throughput Maximizationintermediate
Systems Programmingintermediate
AI Infrastructure Designintermediate
Performance Profilingintermediate
GPU Kernel Developmentintermediate
Model Quantizationintermediate
Batch Inference Optimizationintermediate
Real-Time AI Servingintermediate
Scalable AI Deploymentintermediate

Required Qualifications

Deep expertise in model performance optimization at the inference layer (experience)
Strong background in kernel-level systems programming and data movement (experience)
Proven experience in low-level performance tuning for GPU workloads (experience)
Hands-on experience with CUDA, TensorRT, or similar GPU acceleration frameworks (experience)
Track record of optimizing inference throughput and latency in production AI systems (experience)
Familiarity with PyTorch, TensorFlow, or JAX for model deployment (experience)
Experience designing and building scalable serving infrastructure for AI models (experience)
Ability to navigate technical ambiguity and set strategic direction (experience)
Collaboration experience with research and product teams in AI/ML environments (experience)
Excitement for scaling multimodal AI systems handling real-world workloads (experience)
Bachelor's or advanced degree in Computer Science, Electrical Engineering, or related field (experience)
5+ years of experience in systems engineering or AI infrastructure (experience)

Responsibilities

Lead engineering efforts to improve model serving efficiency for Sora
Optimize inference performance through kernel-level modifications and tuning
Drive data movement optimizations to enhance system throughput and reliability
Design and implement critical serving infrastructure for multimodal models
Partner with research teams to develop inference-friendly model architectures
Profile and analyze performance bottlenecks in GPU inference pipelines
Build scalable systems supporting high-volume AI inference requests
Implement quantization, pruning, and other model optimization techniques
Develop custom CUDA kernels for specialized inference workloads
Collaborate with product teams to ensure reliable model performance at scale
Monitor and improve inference system reliability under production loads
Contribute to technical roadmaps for Sora's inference infrastructure
Mentor junior engineers on performance optimization best practices
Conduct experiments to benchmark optimization strategies

Benefits

general: Competitive salary with equity package
general: Comprehensive medical, dental, and vision insurance
general: Hybrid work model with 3 days in office per week
general: Relocation assistance for new employees to San Francisco
general: Generous paid time off and flexible vacation policy
general: 401(k) retirement savings plan with company match
general: Parental leave and family planning benefits
general: Mental health and wellness programs
general: Professional development stipend for conferences and courses
general: Free lunches and snacks in the office
general: Gym membership reimbursement
general: Commuter benefits and public transit subsidies
general: Employee stock purchase plan opportunities
general: Volunteer time off and charitable donation matching

Target Your Resume for "Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

OpenAI inference engineer jobsSora technical lead careersGPU inference optimization San FranciscoCUDA kernel developer OpenAIAI model serving engineerMultimodal AI inference jobsSenior GPU optimization rolesTensorRT deployment specialistOpenAI San Francisco engineeringVideo generation inference engineerProduction AI serving infrastructureLow-level GPU performance tuningScalable inference systems OpenAIAI research engineering hybrid jobsKernel optimization machine learningData movement GPU optimizationOpenAI Sora team careersInference latency reduction specialistDistributed AI inference engineerHigh-throughput model servingPyTorch inference optimization jobsSan Francisco AI infrastructure rolesResearch

Answer 10 quick questions to check your fit for Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Inference Technical Lead, Sora at OpenAI - San Francisco, CA

Role Overview

Key Responsibilities

As Inference Technical Lead, you'll own the end-to-end optimization of Sora's inference pipeline. Core responsibilities include:

Lead comprehensive engineering efforts to enhance model serving efficiency and inference performance across GPU clusters
Develop and optimize custom CUDA kernels to maximize throughput for multimodal workloads
Analyze and eliminate data movement bottlenecks between GPU memory hierarchies
Design scalable serving infrastructure supporting millions of daily inference requests
Collaborate with researchers to influence model architecture decisions for optimal inference characteristics
Implement advanced optimization techniques including quantization, pruning, and knowledge distillation
Build monitoring and alerting systems for production inference reliability
Profile complex inference workloads using NVIDIA tools (Nsight, DCGM) and custom instrumentation
Mentor engineers on GPU optimization best practices and inference system design
Drive technical roadmaps for next-generation inference infrastructure
Partner with product teams to define SLOs and performance targets for user-facing features
Conduct A/B experiments comparing optimization strategies across diverse workloads
Contribute to open-source inference optimization tools when appropriate

Expect to spend ~40% of time on hands-on optimization, 30% infrastructure design, 20% cross-team collaboration, and 10% strategic planning.

Qualifications

We're seeking candidates with exceptional depth in GPU inference optimization and proven leadership in production AI systems:

7+ years experience in systems programming with 4+ years focused on AI/ML inference optimization
Deep expertise in CUDA programming, GPU kernel development, and low-level optimization
Proven track record shipping high-throughput inference systems serving real-world workloads
Strong understanding of GPU memory hierarchies, data movement, and bandwidth optimization
Experience with TensorRT, ONNX Runtime, or TVM for optimized model deployment
Familiarity with PyTorch/TensorFlow deployment patterns and serving frameworks (Triton, KServe)
Demonstrated ability to lead complex technical projects through ambiguity to successful completion
Experience collaborating with AI researchers to optimize model architectures for inference
Proficiency with performance profiling tools (perf, VTune, Nsight Systems/Compute)
Strong communication skills for cross-functional technical leadership

Bonus: Experience with multimodal models, video processing pipelines, or large-scale distributed training/inference systems.

Salary & Benefits

Compensation Range: $350,000 - $550,000 USD base salary + equity + benefits (total compensation depends on experience and location).

OpenAI offers one of the most comprehensive benefits packages in tech:

Industry-leading medical, dental, vision coverage with 100% premiums covered
Hybrid SF work model (3 days/week in office) with relocation support
Unlimited PTO with manager approval and generous parental leave
401(k) matching and employee stock purchase opportunities
Professional development budget ($10K+/year) for conferences/courses
Wellness benefits including mental health support and gym reimbursement
Daily catered meals, snacks, and comprehensive commuter benefits
Volunteer time off and charitable donation matching programs

Why Join OpenAI?

Work on bleeding-edge video generation models serving millions of users
Directly impact OpenAI's ability to scale safely and reliably
Collaborate with Turing Award winners and leading AI researchers daily
Shape the commercial deployment of foundation models at global scale
Enjoy San Francisco's vibrant tech ecosystem with hybrid flexibility

OpenAI's culture emphasizes impact, ownership, and rapid iteration. We hire exceptional people and give them outsized responsibility from day one.

How to Apply

Ready to optimize the future of AI inference? Submit your resume, LinkedIn, and a brief note explaining:

Your most impactful inference optimization project (quantitative results required)
Experience with GPU kernel development or low-level optimization
Why you're excited about multimodal AI and OpenAI's mission

Application deadline: Rolling until filled. US work authorization required.

Join us in San Francisco to build inference infrastructure that powers the next era of AI creativity!

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

367,500 - 605,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

GPU Inference Optimizationintermediate
Model Serving Efficiencyintermediate
Kernel-Level Programmingintermediate
Data Movement Optimizationintermediate
Low-Level Performance Tuningintermediate
CUDA Programmingintermediate
PyTorch Optimizationintermediate
TensorRT Deploymentintermediate
Distributed Systems Scalingintermediate
Multimodal AI Systemsintermediate
Inference Latency Reductionintermediate
Throughput Maximizationintermediate
Systems Programmingintermediate
AI Infrastructure Designintermediate
Performance Profilingintermediate
GPU Kernel Developmentintermediate
Model Quantizationintermediate
Batch Inference Optimizationintermediate
Real-Time AI Servingintermediate
Scalable AI Deploymentintermediate

Required Qualifications

Deep expertise in model performance optimization at the inference layer (experience)
Strong background in kernel-level systems programming and data movement (experience)
Proven experience in low-level performance tuning for GPU workloads (experience)
Hands-on experience with CUDA, TensorRT, or similar GPU acceleration frameworks (experience)
Track record of optimizing inference throughput and latency in production AI systems (experience)
Familiarity with PyTorch, TensorFlow, or JAX for model deployment (experience)
Experience designing and building scalable serving infrastructure for AI models (experience)
Ability to navigate technical ambiguity and set strategic direction (experience)
Collaboration experience with research and product teams in AI/ML environments (experience)
Excitement for scaling multimodal AI systems handling real-world workloads (experience)
Bachelor's or advanced degree in Computer Science, Electrical Engineering, or related field (experience)
5+ years of experience in systems engineering or AI infrastructure (experience)

Responsibilities

Lead engineering efforts to improve model serving efficiency for Sora
Optimize inference performance through kernel-level modifications and tuning
Drive data movement optimizations to enhance system throughput and reliability
Design and implement critical serving infrastructure for multimodal models
Partner with research teams to develop inference-friendly model architectures
Profile and analyze performance bottlenecks in GPU inference pipelines
Build scalable systems supporting high-volume AI inference requests
Implement quantization, pruning, and other model optimization techniques
Develop custom CUDA kernels for specialized inference workloads
Collaborate with product teams to ensure reliable model performance at scale
Monitor and improve inference system reliability under production loads
Contribute to technical roadmaps for Sora's inference infrastructure
Mentor junior engineers on performance optimization best practices
Conduct experiments to benchmark optimization strategies

Benefits

general: Competitive salary with equity package
general: Comprehensive medical, dental, and vision insurance
general: Hybrid work model with 3 days in office per week
general: Relocation assistance for new employees to San Francisco
general: Generous paid time off and flexible vacation policy
general: 401(k) retirement savings plan with company match
general: Parental leave and family planning benefits
general: Mental health and wellness programs
general: Professional development stipend for conferences and courses
general: Free lunches and snacks in the office
general: Gym membership reimbursement
general: Commuter benefits and public transit subsidies
general: Employee stock purchase plan opportunities
general: Volunteer time off and charitable donation matching

Target Your Resume for "Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap