RESUME AND JOB

Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Inference - Multi Modal at OpenAI: Join the Future of AI in San Francisco

Are you passionate about pushing the boundaries of artificial intelligence? OpenAI is hiring a Software Engineer, Inference - Multi Modal to build the infrastructure powering our most advanced multimodal models. Located in the heart of San Francisco, California, this role offers a unique opportunity to work on cutting-edge AI systems that handle images, audio, and beyond. If you're experienced in LLM inference, GPU optimization, and scalable ML systems, apply now to shape the next generation of AI deployment.

Role Overview

OpenAI's Inference team is at the forefront of deploying our flagship models like GPT series, 4o Image Generation, and Whisper across diverse platforms. As a Software Engineer on the Multi Modal Inference team, you'll develop high-performance infrastructure for serving real-time audio, image, and multimodal workloads at massive scale. This small, fast-moving team collaborates directly with world-class researchers and product teams to bring experimental AI capabilities into production.

Multimodal inference represents the future of AI interaction—think generating speech from text, understanding complex images, and enabling natural human-AI conversations that transcend text-only interfaces. Your work will ensure these models are reliable, performant, and accessible to millions of developers worldwide. Based in San Francisco, you'll thrive in an environment that values innovation, rapid iteration, and cross-functional collaboration.

This isn't just engineering; it's pioneering the infrastructure that makes advanced AI ubiquitous. With OpenAI's mission to benefit all of humanity, your contributions will have global impact.

Key Responsibilities

In this high-impact role, you'll tackle challenging problems at the intersection of ML systems, distributed computing, and real-time performance. Key responsibilities include:

Designing and implementing inference infrastructure optimized for large-scale multimodal models handling image, audio, and text inputs.
Optimizing end-to-end systems for ultra-low latency and high-throughput delivery of complex multimodal outputs.
Building tools and workflows that seamlessly transition experimental research prototypes into robust production services.
Collaborating intimately with AI researchers training frontier models and product engineers defining novel user interactions.
Driving system-level improvements in GPU utilization, tensor parallelism, pipeline parallelism, and custom hardware abstractions.
Developing scalable data pipelines for heterogeneous model architectures and diverse input/output formats.
Implementing monitoring, alerting, and auto-scaling systems for production inference clusters.
Contributing to open-source inference tooling and developer platforms that power the AI ecosystem.
Troubleshooting live production issues across distributed GPU clusters under heavy load.
Partnering with infrastructure teams to deploy on cutting-edge hardware like H100s, TPUs, and custom accelerators.
Creating APIs and SDKs that enable developers to easily integrate multimodal capabilities into applications.
Staying ahead of evolving research by incorporating new techniques like speculative decoding and mixture-of-experts routing.
Conducting performance profiling and benchmarking across diverse model sizes and workloads.

Expect to own projects end-to-end, from initial design through production deployment and ongoing optimization.

Qualifications

We're seeking engineers who excel in ambiguous, fast-paced environments and have a track record of shipping production ML systems. You might thrive if you:

Have 5+ years building and scaling inference systems for LLMs, vision models, or multimodal AI.
Deep expertise with GPU-based ML workloads, understanding memory bandwidth, compute utilization, and KV cache dynamics.
Experience optimizing complex data pipelines for images, audio spectrograms, and tokenized multimodal inputs.
Familiarity with state-of-the-art inference frameworks (vLLM, TensorRT-LLM, SGLang, custom serving stacks).
Strong systems programming skills in Python/C++/CUDA for performance-critical components.
Proven ability to collaborate with research scientists on bleeding-edge model architectures.
Comfort owning distributed systems spanning networking, storage, and orchestration layers.
Experience with containerization (Docker/Kubernetes) and cloud infrastructure (AWS/GCP).
Bonus: Production experience with diffusion models, audio codecs, or real-time speech systems.

Salary & Benefits

OpenAI offers competitive compensation packages for Software Engineers in Inference roles, typically ranging from $250,000 to $450,000 base salary plus significant equity. Total compensation can exceed $700K+ for top performers, including bonuses and stock options in our rapidly growing company.

Comprehensive benefits include:

Premium health, dental, vision coverage with low employee premiums
401(k) with generous company match
Unlimited PTO and flexible work policies
Parental leave (16+ weeks) and fertility benefits
Professional growth stipend ($3K+/year)
Wellness programs including mental health support
Daily catered meals and fully stocked kitchens
Commuter benefits and gym memberships
Employee resource groups and volunteer opportunities

This package reflects OpenAI's commitment to attracting world-class talent building safe AGI.

Why Join OpenAI?

OpenAI isn't just another tech company—it's the leading force in artificial general intelligence research and deployment. Our models power applications used by millions daily, from ChatGPT to DALL·E and beyond. Joining the Inference team means:

Working with the smartest researchers and engineers in AI
Access to unlimited compute resources and latest hardware
Direct impact on products serving 100M+ weekly users
Equity in a company valued at $150B+ with explosive growth
Culture emphasizing safety, rapid iteration, and mission alignment
San Francisco location in vibrant SoMa tech hub

We're building AI that benefits humanity, and your expertise in multimodal inference will accelerate that mission.

How to Apply

Ready to power the next era of multimodal AI? Submit your resume and a brief note on your most impactful inference project. Our hiring process includes:

30-minute recruiter screen
Technical deep-dive with engineering team
Systems design interview focused on ML infra
Live coding challenge with multimodal scenarios
Final interviews with research and leadership

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Apply today to join the AI revolution!

This job posting was analyzed and optimized for SEO to help candidates discover Software Engineer Inference opportunities at OpenAI. Keywords include multimodal model serving, GPU inference optimization, LLM deployment San Francisco, and OpenAI careers.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Multimodal Model Inferenceintermediate
GPU Optimizationintermediate
Tensor Parallelismintermediate
vLLMintermediate
TensorRT-LLMintermediate
Distributed Computingintermediate
Low-Latency Systemsintermediate
High-Throughput Inferenceintermediate
Image Generation Modelsintermediate
Audio Synthesisintermediate
ML Inference Infrastructureintermediate
CUDA Programmingintermediate
Model Servingintermediate
Real-Time AI Processingintermediate
Hardware Abstractionintermediate
LLM Deploymentintermediate
PyTorchintermediate
JAXintermediate
Kubernetesintermediate
Dockerintermediate

Required Qualifications

Experience building and scaling inference systems for LLMs or multimodal models (experience)
Hands-on work with GPU-based ML workloads and performance dynamics of large models (experience)
Deep understanding of complex data handling for images, audio, and non-text modalities (experience)
Familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems (experience)
Comfortable with systems spanning networking, distributed compute, and high-throughput data (experience)
Proven ability to own problems end-to-end in ambiguous, fast-moving environments (experience)
Experience collaborating closely with research teams on experimental ML workflows (experience)
Strong skills in optimizing for high-throughput and low-latency delivery (experience)
Knowledge of GPU utilization, tensor parallelism, and hardware abstraction layers (experience)
Background in transitioning experimental research into reliable production services (experience)
Experience with real-time audio and image processing pipelines (experience)
Proficiency in Python, C++, or similar for performance-critical systems (experience)

Responsibilities

Design and implement scalable inference infrastructure for multimodal AI models
Optimize systems for high-throughput, low-latency image and audio processing
Build reliable production services for real-time multimodal workloads
Collaborate with researchers to deploy next-generation multimodal models
Develop GPU optimization strategies including tensor parallelism techniques
Create hardware abstraction layers for diverse ML inference hardware
Enable seamless transition of experimental research workflows to production
Partner with product teams to define new interaction modalities beyond text
Implement high-performance data pipelines for heterogeneous model inputs/outputs
Monitor and improve system reliability, availability, and scalability in production
Contribute to developer tools and APIs for multimodal model serving
Troubleshoot and resolve performance bottlenecks in live inference systems
Work cross-functionally with infra, research, and product engineering teams

Benefits

general: Competitive salary with equity in a high-growth AI company
general: Comprehensive health, dental, and vision insurance coverage
general: 401(k) matching and retirement savings plans
general: Unlimited PTO and flexible vacation policy
general: Generous parental leave and family planning benefits
general: Remote-friendly work environment with SF headquarters
general: Professional development stipend for conferences and courses
general: Mental health support and wellness programs
general: Free lunches, snacks, and meals at the office
general: Commuter benefits and transportation allowances
general: Employee stock purchase plan opportunities
general: Volunteer time off and charitable giving matching
general: Cutting-edge hardware including latest GPUs for personal projects

Target Your Resume for "Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

software engineer inference openaimultimodal model serving jobsgpu inference engineer san franciscoopenai careers software engineerllm inference optimizationvllm tensorrt llm jobsai inference infrastructure rolesimage generation model deploymentaudio ai production engineeropenai san francisco jobsdistributed ml inferencelow latency ai servinggpu tensor parallelism jobsmultimodal ai engineeropenai inference team hiringreal time ai processing careersml systems engineer openaifrontier ai model deploymentsan francisco ai jobs 2024high performance ml infraopenai software engineer salarywhisper gpt4o inference rolesScaling

Answer 10 quick questions to check your fit for Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Inference - Multi Modal at OpenAI: Join the Future of AI in San Francisco

Role Overview

This isn't just engineering; it's pioneering the infrastructure that makes advanced AI ubiquitous. With OpenAI's mission to benefit all of humanity, your contributions will have global impact.

Key Responsibilities

In this high-impact role, you'll tackle challenging problems at the intersection of ML systems, distributed computing, and real-time performance. Key responsibilities include:

Designing and implementing inference infrastructure optimized for large-scale multimodal models handling image, audio, and text inputs.
Optimizing end-to-end systems for ultra-low latency and high-throughput delivery of complex multimodal outputs.
Building tools and workflows that seamlessly transition experimental research prototypes into robust production services.
Collaborating intimately with AI researchers training frontier models and product engineers defining novel user interactions.
Driving system-level improvements in GPU utilization, tensor parallelism, pipeline parallelism, and custom hardware abstractions.
Developing scalable data pipelines for heterogeneous model architectures and diverse input/output formats.
Implementing monitoring, alerting, and auto-scaling systems for production inference clusters.
Contributing to open-source inference tooling and developer platforms that power the AI ecosystem.
Troubleshooting live production issues across distributed GPU clusters under heavy load.
Partnering with infrastructure teams to deploy on cutting-edge hardware like H100s, TPUs, and custom accelerators.
Creating APIs and SDKs that enable developers to easily integrate multimodal capabilities into applications.
Staying ahead of evolving research by incorporating new techniques like speculative decoding and mixture-of-experts routing.
Conducting performance profiling and benchmarking across diverse model sizes and workloads.

Expect to own projects end-to-end, from initial design through production deployment and ongoing optimization.

Qualifications

We're seeking engineers who excel in ambiguous, fast-paced environments and have a track record of shipping production ML systems. You might thrive if you:

Have 5+ years building and scaling inference systems for LLMs, vision models, or multimodal AI.
Deep expertise with GPU-based ML workloads, understanding memory bandwidth, compute utilization, and KV cache dynamics.
Experience optimizing complex data pipelines for images, audio spectrograms, and tokenized multimodal inputs.
Familiarity with state-of-the-art inference frameworks (vLLM, TensorRT-LLM, SGLang, custom serving stacks).
Strong systems programming skills in Python/C++/CUDA for performance-critical components.
Proven ability to collaborate with research scientists on bleeding-edge model architectures.
Comfort owning distributed systems spanning networking, storage, and orchestration layers.
Experience with containerization (Docker/Kubernetes) and cloud infrastructure (AWS/GCP).
Bonus: Production experience with diffusion models, audio codecs, or real-time speech systems.

Salary & Benefits

Comprehensive benefits include:

Premium health, dental, vision coverage with low employee premiums
401(k) with generous company match
Unlimited PTO and flexible work policies
Parental leave (16+ weeks) and fertility benefits
Professional growth stipend ($3K+/year)
Wellness programs including mental health support
Daily catered meals and fully stocked kitchens
Commuter benefits and gym memberships
Employee resource groups and volunteer opportunities

This package reflects OpenAI's commitment to attracting world-class talent building safe AGI.

Why Join OpenAI?

Working with the smartest researchers and engineers in AI
Access to unlimited compute resources and latest hardware
Direct impact on products serving 100M+ weekly users
Equity in a company valued at $150B+ with explosive growth
Culture emphasizing safety, rapid iteration, and mission alignment
San Francisco location in vibrant SoMa tech hub

We're building AI that benefits humanity, and your expertise in multimodal inference will accelerate that mission.

How to Apply

Ready to power the next era of multimodal AI? Submit your resume and a brief note on your most impactful inference project. Our hiring process includes:

30-minute recruiter screen
Technical deep-dive with engineering team
Systems design interview focused on ML infra
Live coding challenge with multimodal scenarios
Final interviews with research and leadership

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Apply today to join the AI revolution!

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Multimodal Model Inferenceintermediate
GPU Optimizationintermediate
Tensor Parallelismintermediate
vLLMintermediate
TensorRT-LLMintermediate
Distributed Computingintermediate
Low-Latency Systemsintermediate
High-Throughput Inferenceintermediate
Image Generation Modelsintermediate
Audio Synthesisintermediate
ML Inference Infrastructureintermediate
CUDA Programmingintermediate
Model Servingintermediate
Real-Time AI Processingintermediate
Hardware Abstractionintermediate
LLM Deploymentintermediate
PyTorchintermediate
JAXintermediate
Kubernetesintermediate
Dockerintermediate

Required Qualifications

Experience building and scaling inference systems for LLMs or multimodal models (experience)
Hands-on work with GPU-based ML workloads and performance dynamics of large models (experience)
Deep understanding of complex data handling for images, audio, and non-text modalities (experience)
Familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems (experience)
Comfortable with systems spanning networking, distributed compute, and high-throughput data (experience)
Proven ability to own problems end-to-end in ambiguous, fast-moving environments (experience)
Experience collaborating closely with research teams on experimental ML workflows (experience)
Strong skills in optimizing for high-throughput and low-latency delivery (experience)
Knowledge of GPU utilization, tensor parallelism, and hardware abstraction layers (experience)
Background in transitioning experimental research into reliable production services (experience)
Experience with real-time audio and image processing pipelines (experience)
Proficiency in Python, C++, or similar for performance-critical systems (experience)

Responsibilities

Design and implement scalable inference infrastructure for multimodal AI models
Optimize systems for high-throughput, low-latency image and audio processing
Build reliable production services for real-time multimodal workloads
Collaborate with researchers to deploy next-generation multimodal models
Develop GPU optimization strategies including tensor parallelism techniques
Create hardware abstraction layers for diverse ML inference hardware
Enable seamless transition of experimental research workflows to production
Partner with product teams to define new interaction modalities beyond text
Implement high-performance data pipelines for heterogeneous model inputs/outputs
Monitor and improve system reliability, availability, and scalability in production
Contribute to developer tools and APIs for multimodal model serving
Troubleshoot and resolve performance bottlenecks in live inference systems
Work cross-functionally with infra, research, and product engineering teams

Benefits

general: Competitive salary with equity in a high-growth AI company
general: Comprehensive health, dental, and vision insurance coverage
general: 401(k) matching and retirement savings plans
general: Unlimited PTO and flexible vacation policy
general: Generous parental leave and family planning benefits
general: Remote-friendly work environment with SF headquarters
general: Professional development stipend for conferences and courses
general: Mental health support and wellness programs
general: Free lunches, snacks, and meals at the office
general: Commuter benefits and transportation allowances
general: Employee stock purchase plan opportunities
general: Volunteer time off and charitable giving matching
general: Cutting-edge hardware including latest GPUs for personal projects

Target Your Resume for "Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap