Resume and JobRESUME AND JOB
OpenAI logo

Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Inference Technical Lead, Sora at OpenAI - San Francisco, CA

Join OpenAI's pioneering Sora team as an Inference Technical Lead and drive the future of multimodal AI inference. This senior-level role in San Francisco offers a unique opportunity to optimize GPU inference for cutting-edge video generation models. If you have deep expertise in kernel-level optimization and scalable AI systems, apply now to shape the next generation of AI products.

Role Overview

The Inference Technical Lead position on the Sora team at OpenAI represents a critical leadership role in advancing multimodal AI capabilities. Sora, OpenAI's groundbreaking text-to-video model, requires world-class inference infrastructure to deliver reliable, high-performance experiences to millions of users worldwide. As the technical lead for inference engineering, you'll spearhead initiatives that directly impact model serving efficiency, system scalability, and overall product reliability.

This hybrid position based in San Francisco combines deep systems engineering with AI research collaboration. You'll work alongside world-class researchers to design inference-friendly architectures while building production-grade serving infrastructure. Your optimizations will enable leadership to focus on higher-leverage strategic initiatives, making this role foundational to OpenAI's scaling objectives.

Key focus areas include GPU kernel optimization, data movement efficiency, low-level performance tuning, and distributed inference systems. Success in this role requires both breadth (understanding full-stack AI infrastructure) and depth (mastery of GPU programming and inference optimization techniques). You'll thrive if you excel at navigating technical ambiguity, driving complex projects to completion, and communicating effectively across research, engineering, and product teams.

OpenAI's mission-driven culture emphasizes safety, reliability, and broad societal benefit. Your work will directly contribute to safely deploying powerful multimodal capabilities while maintaining the highest standards of performance and user experience.

Key Responsibilities

As Inference Technical Lead, you'll own the end-to-end optimization of Sora's inference pipeline. Core responsibilities include:

  • Lead comprehensive engineering efforts to enhance model serving efficiency and inference performance across GPU clusters
  • Develop and optimize custom CUDA kernels to maximize throughput for multimodal workloads
  • Analyze and eliminate data movement bottlenecks between GPU memory hierarchies
  • Design scalable serving infrastructure supporting millions of daily inference requests
  • Collaborate with researchers to influence model architecture decisions for optimal inference characteristics
  • Implement advanced optimization techniques including quantization, pruning, and knowledge distillation
  • Build monitoring and alerting systems for production inference reliability
  • Profile complex inference workloads using NVIDIA tools (Nsight, DCGM) and custom instrumentation
  • Mentor engineers on GPU optimization best practices and inference system design
  • Drive technical roadmaps for next-generation inference infrastructure
  • Partner with product teams to define SLOs and performance targets for user-facing features
  • Conduct A/B experiments comparing optimization strategies across diverse workloads
  • Contribute to open-source inference optimization tools when appropriate

Expect to spend ~40% of time on hands-on optimization, 30% infrastructure design, 20% cross-team collaboration, and 10% strategic planning.

Qualifications

We're seeking candidates with exceptional depth in GPU inference optimization and proven leadership in production AI systems:

  • 7+ years experience in systems programming with 4+ years focused on AI/ML inference optimization
  • Deep expertise in CUDA programming, GPU kernel development, and low-level optimization
  • Proven track record shipping high-throughput inference systems serving real-world workloads
  • Strong understanding of GPU memory hierarchies, data movement, and bandwidth optimization
  • Experience with TensorRT, ONNX Runtime, or TVM for optimized model deployment
  • Familiarity with PyTorch/TensorFlow deployment patterns and serving frameworks (Triton, KServe)
  • Demonstrated ability to lead complex technical projects through ambiguity to successful completion
  • Experience collaborating with AI researchers to optimize model architectures for inference
  • Proficiency with performance profiling tools (perf, VTune, Nsight Systems/Compute)
  • Strong communication skills for cross-functional technical leadership

Bonus: Experience with multimodal models, video processing pipelines, or large-scale distributed training/inference systems.

Salary & Benefits

Compensation Range: $350,000 - $550,000 USD base salary + equity + benefits (total compensation depends on experience and location).

OpenAI offers one of the most comprehensive benefits packages in tech:

  • Industry-leading medical, dental, vision coverage with 100% premiums covered
  • Hybrid SF work model (3 days/week in office) with relocation support
  • Unlimited PTO with manager approval and generous parental leave
  • 401(k) matching and employee stock purchase opportunities
  • Professional development budget ($10K+/year) for conferences/courses
  • Wellness benefits including mental health support and gym reimbursement
  • Daily catered meals, snacks, and comprehensive commuter benefits
  • Volunteer time off and charitable donation matching programs

Why Join OpenAI?

OpenAI represents humanity's best hope for safely harnessing artificial general intelligence. The Sora team specifically pioneers multimodal capabilities that will redefine creative workflows worldwide. As Inference Technical Lead, you'll:

  • Work on bleeding-edge video generation models serving millions of users
  • Directly impact OpenAI's ability to scale safely and reliably
  • Collaborate with Turing Award winners and leading AI researchers daily
  • Shape the commercial deployment of foundation models at global scale
  • Enjoy San Francisco's vibrant tech ecosystem with hybrid flexibility

OpenAI's culture emphasizes impact, ownership, and rapid iteration. We hire exceptional people and give them outsized responsibility from day one.

How to Apply

Ready to optimize the future of AI inference? Submit your resume, LinkedIn, and a brief note explaining:

  1. Your most impactful inference optimization project (quantitative results required)
  2. Experience with GPU kernel development or low-level optimization
  3. Why you're excited about multimodal AI and OpenAI's mission

Application deadline: Rolling until filled. US work authorization required.

Join us in San Francisco to build inference infrastructure that powers the next era of AI creativity!

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

367,500 - 605,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • GPU Inference Optimizationintermediate
  • Model Serving Efficiencyintermediate
  • Kernel-Level Programmingintermediate
  • Data Movement Optimizationintermediate
  • Low-Level Performance Tuningintermediate
  • CUDA Programmingintermediate
  • PyTorch Optimizationintermediate
  • TensorRT Deploymentintermediate
  • Distributed Systems Scalingintermediate
  • Multimodal AI Systemsintermediate
  • Inference Latency Reductionintermediate
  • Throughput Maximizationintermediate
  • Systems Programmingintermediate
  • AI Infrastructure Designintermediate
  • Performance Profilingintermediate
  • GPU Kernel Developmentintermediate
  • Model Quantizationintermediate
  • Batch Inference Optimizationintermediate
  • Real-Time AI Servingintermediate
  • Scalable AI Deploymentintermediate

Required Qualifications

  • Deep expertise in model performance optimization at the inference layer (experience)
  • Strong background in kernel-level systems programming and data movement (experience)
  • Proven experience in low-level performance tuning for GPU workloads (experience)
  • Hands-on experience with CUDA, TensorRT, or similar GPU acceleration frameworks (experience)
  • Track record of optimizing inference throughput and latency in production AI systems (experience)
  • Familiarity with PyTorch, TensorFlow, or JAX for model deployment (experience)
  • Experience designing and building scalable serving infrastructure for AI models (experience)
  • Ability to navigate technical ambiguity and set strategic direction (experience)
  • Collaboration experience with research and product teams in AI/ML environments (experience)
  • Excitement for scaling multimodal AI systems handling real-world workloads (experience)
  • Bachelor's or advanced degree in Computer Science, Electrical Engineering, or related field (experience)
  • 5+ years of experience in systems engineering or AI infrastructure (experience)

Responsibilities

  • Lead engineering efforts to improve model serving efficiency for Sora
  • Optimize inference performance through kernel-level modifications and tuning
  • Drive data movement optimizations to enhance system throughput and reliability
  • Design and implement critical serving infrastructure for multimodal models
  • Partner with research teams to develop inference-friendly model architectures
  • Profile and analyze performance bottlenecks in GPU inference pipelines
  • Build scalable systems supporting high-volume AI inference requests
  • Implement quantization, pruning, and other model optimization techniques
  • Develop custom CUDA kernels for specialized inference workloads
  • Collaborate with product teams to ensure reliable model performance at scale
  • Monitor and improve inference system reliability under production loads
  • Contribute to technical roadmaps for Sora's inference infrastructure
  • Mentor junior engineers on performance optimization best practices
  • Conduct experiments to benchmark optimization strategies

Benefits

  • general: Competitive salary with equity package
  • general: Comprehensive medical, dental, and vision insurance
  • general: Hybrid work model with 3 days in office per week
  • general: Relocation assistance for new employees to San Francisco
  • general: Generous paid time off and flexible vacation policy
  • general: 401(k) retirement savings plan with company match
  • general: Parental leave and family planning benefits
  • general: Mental health and wellness programs
  • general: Professional development stipend for conferences and courses
  • general: Free lunches and snacks in the office
  • general: Gym membership reimbursement
  • general: Commuter benefits and public transit subsidies
  • general: Employee stock purchase plan opportunities
  • general: Volunteer time off and charitable donation matching

Target Your Resume for "Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

OpenAI inference engineer jobsSora technical lead careersGPU inference optimization San FranciscoCUDA kernel developer OpenAIAI model serving engineerMultimodal AI inference jobsSenior GPU optimization rolesTensorRT deployment specialistOpenAI San Francisco engineeringVideo generation inference engineerProduction AI serving infrastructureLow-level GPU performance tuningScalable inference systems OpenAIAI research engineering hybrid jobsKernel optimization machine learningData movement GPU optimizationOpenAI Sora team careersInference latency reduction specialistDistributed AI inference engineerHigh-throughput model servingPyTorch inference optimization jobsSan Francisco AI infrastructure rolesResearch

Answer 10 quick questions to check your fit for Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Inference Technical Lead, Sora at OpenAI - San Francisco, CA

Join OpenAI's pioneering Sora team as an Inference Technical Lead and drive the future of multimodal AI inference. This senior-level role in San Francisco offers a unique opportunity to optimize GPU inference for cutting-edge video generation models. If you have deep expertise in kernel-level optimization and scalable AI systems, apply now to shape the next generation of AI products.

Role Overview

The Inference Technical Lead position on the Sora team at OpenAI represents a critical leadership role in advancing multimodal AI capabilities. Sora, OpenAI's groundbreaking text-to-video model, requires world-class inference infrastructure to deliver reliable, high-performance experiences to millions of users worldwide. As the technical lead for inference engineering, you'll spearhead initiatives that directly impact model serving efficiency, system scalability, and overall product reliability.

This hybrid position based in San Francisco combines deep systems engineering with AI research collaboration. You'll work alongside world-class researchers to design inference-friendly architectures while building production-grade serving infrastructure. Your optimizations will enable leadership to focus on higher-leverage strategic initiatives, making this role foundational to OpenAI's scaling objectives.

Key focus areas include GPU kernel optimization, data movement efficiency, low-level performance tuning, and distributed inference systems. Success in this role requires both breadth (understanding full-stack AI infrastructure) and depth (mastery of GPU programming and inference optimization techniques). You'll thrive if you excel at navigating technical ambiguity, driving complex projects to completion, and communicating effectively across research, engineering, and product teams.

OpenAI's mission-driven culture emphasizes safety, reliability, and broad societal benefit. Your work will directly contribute to safely deploying powerful multimodal capabilities while maintaining the highest standards of performance and user experience.

Key Responsibilities

As Inference Technical Lead, you'll own the end-to-end optimization of Sora's inference pipeline. Core responsibilities include:

  • Lead comprehensive engineering efforts to enhance model serving efficiency and inference performance across GPU clusters
  • Develop and optimize custom CUDA kernels to maximize throughput for multimodal workloads
  • Analyze and eliminate data movement bottlenecks between GPU memory hierarchies
  • Design scalable serving infrastructure supporting millions of daily inference requests
  • Collaborate with researchers to influence model architecture decisions for optimal inference characteristics
  • Implement advanced optimization techniques including quantization, pruning, and knowledge distillation
  • Build monitoring and alerting systems for production inference reliability
  • Profile complex inference workloads using NVIDIA tools (Nsight, DCGM) and custom instrumentation
  • Mentor engineers on GPU optimization best practices and inference system design
  • Drive technical roadmaps for next-generation inference infrastructure
  • Partner with product teams to define SLOs and performance targets for user-facing features
  • Conduct A/B experiments comparing optimization strategies across diverse workloads
  • Contribute to open-source inference optimization tools when appropriate

Expect to spend ~40% of time on hands-on optimization, 30% infrastructure design, 20% cross-team collaboration, and 10% strategic planning.

Qualifications

We're seeking candidates with exceptional depth in GPU inference optimization and proven leadership in production AI systems:

  • 7+ years experience in systems programming with 4+ years focused on AI/ML inference optimization
  • Deep expertise in CUDA programming, GPU kernel development, and low-level optimization
  • Proven track record shipping high-throughput inference systems serving real-world workloads
  • Strong understanding of GPU memory hierarchies, data movement, and bandwidth optimization
  • Experience with TensorRT, ONNX Runtime, or TVM for optimized model deployment
  • Familiarity with PyTorch/TensorFlow deployment patterns and serving frameworks (Triton, KServe)
  • Demonstrated ability to lead complex technical projects through ambiguity to successful completion
  • Experience collaborating with AI researchers to optimize model architectures for inference
  • Proficiency with performance profiling tools (perf, VTune, Nsight Systems/Compute)
  • Strong communication skills for cross-functional technical leadership

Bonus: Experience with multimodal models, video processing pipelines, or large-scale distributed training/inference systems.

Salary & Benefits

Compensation Range: $350,000 - $550,000 USD base salary + equity + benefits (total compensation depends on experience and location).

OpenAI offers one of the most comprehensive benefits packages in tech:

  • Industry-leading medical, dental, vision coverage with 100% premiums covered
  • Hybrid SF work model (3 days/week in office) with relocation support
  • Unlimited PTO with manager approval and generous parental leave
  • 401(k) matching and employee stock purchase opportunities
  • Professional development budget ($10K+/year) for conferences/courses
  • Wellness benefits including mental health support and gym reimbursement
  • Daily catered meals, snacks, and comprehensive commuter benefits
  • Volunteer time off and charitable donation matching programs

Why Join OpenAI?

OpenAI represents humanity's best hope for safely harnessing artificial general intelligence. The Sora team specifically pioneers multimodal capabilities that will redefine creative workflows worldwide. As Inference Technical Lead, you'll:

  • Work on bleeding-edge video generation models serving millions of users
  • Directly impact OpenAI's ability to scale safely and reliably
  • Collaborate with Turing Award winners and leading AI researchers daily
  • Shape the commercial deployment of foundation models at global scale
  • Enjoy San Francisco's vibrant tech ecosystem with hybrid flexibility

OpenAI's culture emphasizes impact, ownership, and rapid iteration. We hire exceptional people and give them outsized responsibility from day one.

How to Apply

Ready to optimize the future of AI inference? Submit your resume, LinkedIn, and a brief note explaining:

  1. Your most impactful inference optimization project (quantitative results required)
  2. Experience with GPU kernel development or low-level optimization
  3. Why you're excited about multimodal AI and OpenAI's mission

Application deadline: Rolling until filled. US work authorization required.

Join us in San Francisco to build inference infrastructure that powers the next era of AI creativity!

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

367,500 - 605,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • GPU Inference Optimizationintermediate
  • Model Serving Efficiencyintermediate
  • Kernel-Level Programmingintermediate
  • Data Movement Optimizationintermediate
  • Low-Level Performance Tuningintermediate
  • CUDA Programmingintermediate
  • PyTorch Optimizationintermediate
  • TensorRT Deploymentintermediate
  • Distributed Systems Scalingintermediate
  • Multimodal AI Systemsintermediate
  • Inference Latency Reductionintermediate
  • Throughput Maximizationintermediate
  • Systems Programmingintermediate
  • AI Infrastructure Designintermediate
  • Performance Profilingintermediate
  • GPU Kernel Developmentintermediate
  • Model Quantizationintermediate
  • Batch Inference Optimizationintermediate
  • Real-Time AI Servingintermediate
  • Scalable AI Deploymentintermediate

Required Qualifications

  • Deep expertise in model performance optimization at the inference layer (experience)
  • Strong background in kernel-level systems programming and data movement (experience)
  • Proven experience in low-level performance tuning for GPU workloads (experience)
  • Hands-on experience with CUDA, TensorRT, or similar GPU acceleration frameworks (experience)
  • Track record of optimizing inference throughput and latency in production AI systems (experience)
  • Familiarity with PyTorch, TensorFlow, or JAX for model deployment (experience)
  • Experience designing and building scalable serving infrastructure for AI models (experience)
  • Ability to navigate technical ambiguity and set strategic direction (experience)
  • Collaboration experience with research and product teams in AI/ML environments (experience)
  • Excitement for scaling multimodal AI systems handling real-world workloads (experience)
  • Bachelor's or advanced degree in Computer Science, Electrical Engineering, or related field (experience)
  • 5+ years of experience in systems engineering or AI infrastructure (experience)

Responsibilities

  • Lead engineering efforts to improve model serving efficiency for Sora
  • Optimize inference performance through kernel-level modifications and tuning
  • Drive data movement optimizations to enhance system throughput and reliability
  • Design and implement critical serving infrastructure for multimodal models
  • Partner with research teams to develop inference-friendly model architectures
  • Profile and analyze performance bottlenecks in GPU inference pipelines
  • Build scalable systems supporting high-volume AI inference requests
  • Implement quantization, pruning, and other model optimization techniques
  • Develop custom CUDA kernels for specialized inference workloads
  • Collaborate with product teams to ensure reliable model performance at scale
  • Monitor and improve inference system reliability under production loads
  • Contribute to technical roadmaps for Sora's inference infrastructure
  • Mentor junior engineers on performance optimization best practices
  • Conduct experiments to benchmark optimization strategies

Benefits

  • general: Competitive salary with equity package
  • general: Comprehensive medical, dental, and vision insurance
  • general: Hybrid work model with 3 days in office per week
  • general: Relocation assistance for new employees to San Francisco
  • general: Generous paid time off and flexible vacation policy
  • general: 401(k) retirement savings plan with company match
  • general: Parental leave and family planning benefits
  • general: Mental health and wellness programs
  • general: Professional development stipend for conferences and courses
  • general: Free lunches and snacks in the office
  • general: Gym membership reimbursement
  • general: Commuter benefits and public transit subsidies
  • general: Employee stock purchase plan opportunities
  • general: Volunteer time off and charitable donation matching

Target Your Resume for "Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

OpenAI inference engineer jobsSora technical lead careersGPU inference optimization San FranciscoCUDA kernel developer OpenAIAI model serving engineerMultimodal AI inference jobsSenior GPU optimization rolesTensorRT deployment specialistOpenAI San Francisco engineeringVideo generation inference engineerProduction AI serving infrastructureLow-level GPU performance tuningScalable inference systems OpenAIAI research engineering hybrid jobsKernel optimization machine learningData movement GPU optimizationOpenAI Sora team careersInference latency reduction specialistDistributed AI inference engineerHigh-throughput model servingPyTorch inference optimization jobsSan Francisco AI infrastructure rolesResearch

Answer 10 quick questions to check your fit for Inference Technical Lead Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.