Resume and JobRESUME AND JOB
OpenAI logo

Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Inference - Multi Modal at OpenAI: Join the Future of AI in San Francisco

Are you passionate about pushing the boundaries of artificial intelligence? OpenAI is hiring a Software Engineer, Inference - Multi Modal to build the infrastructure powering our most advanced multimodal models. Located in the heart of San Francisco, California, this role offers a unique opportunity to work on cutting-edge AI systems that handle images, audio, and beyond. If you're experienced in LLM inference, GPU optimization, and scalable ML systems, apply now to shape the next generation of AI deployment.

Role Overview

OpenAI's Inference team is at the forefront of deploying our flagship models like GPT series, 4o Image Generation, and Whisper across diverse platforms. As a Software Engineer on the Multi Modal Inference team, you'll develop high-performance infrastructure for serving real-time audio, image, and multimodal workloads at massive scale. This small, fast-moving team collaborates directly with world-class researchers and product teams to bring experimental AI capabilities into production.

Multimodal inference represents the future of AI interaction—think generating speech from text, understanding complex images, and enabling natural human-AI conversations that transcend text-only interfaces. Your work will ensure these models are reliable, performant, and accessible to millions of developers worldwide. Based in San Francisco, you'll thrive in an environment that values innovation, rapid iteration, and cross-functional collaboration.

This isn't just engineering; it's pioneering the infrastructure that makes advanced AI ubiquitous. With OpenAI's mission to benefit all of humanity, your contributions will have global impact.

Key Responsibilities

In this high-impact role, you'll tackle challenging problems at the intersection of ML systems, distributed computing, and real-time performance. Key responsibilities include:

  • Designing and implementing inference infrastructure optimized for large-scale multimodal models handling image, audio, and text inputs.
  • Optimizing end-to-end systems for ultra-low latency and high-throughput delivery of complex multimodal outputs.
  • Building tools and workflows that seamlessly transition experimental research prototypes into robust production services.
  • Collaborating intimately with AI researchers training frontier models and product engineers defining novel user interactions.
  • Driving system-level improvements in GPU utilization, tensor parallelism, pipeline parallelism, and custom hardware abstractions.
  • Developing scalable data pipelines for heterogeneous model architectures and diverse input/output formats.
  • Implementing monitoring, alerting, and auto-scaling systems for production inference clusters.
  • Contributing to open-source inference tooling and developer platforms that power the AI ecosystem.
  • Troubleshooting live production issues across distributed GPU clusters under heavy load.
  • Partnering with infrastructure teams to deploy on cutting-edge hardware like H100s, TPUs, and custom accelerators.
  • Creating APIs and SDKs that enable developers to easily integrate multimodal capabilities into applications.
  • Staying ahead of evolving research by incorporating new techniques like speculative decoding and mixture-of-experts routing.
  • Conducting performance profiling and benchmarking across diverse model sizes and workloads.

Expect to own projects end-to-end, from initial design through production deployment and ongoing optimization.

Qualifications

We're seeking engineers who excel in ambiguous, fast-paced environments and have a track record of shipping production ML systems. You might thrive if you:

  • Have 5+ years building and scaling inference systems for LLMs, vision models, or multimodal AI.
  • Deep expertise with GPU-based ML workloads, understanding memory bandwidth, compute utilization, and KV cache dynamics.
  • Experience optimizing complex data pipelines for images, audio spectrograms, and tokenized multimodal inputs.
  • Familiarity with state-of-the-art inference frameworks (vLLM, TensorRT-LLM, SGLang, custom serving stacks).
  • Strong systems programming skills in Python/C++/CUDA for performance-critical components.
  • Proven ability to collaborate with research scientists on bleeding-edge model architectures.
  • Comfort owning distributed systems spanning networking, storage, and orchestration layers.
  • Experience with containerization (Docker/Kubernetes) and cloud infrastructure (AWS/GCP).
  • Bonus: Production experience with diffusion models, audio codecs, or real-time speech systems.

Salary & Benefits

OpenAI offers competitive compensation packages for Software Engineers in Inference roles, typically ranging from $250,000 to $450,000 base salary plus significant equity. Total compensation can exceed $700K+ for top performers, including bonuses and stock options in our rapidly growing company.

Comprehensive benefits include:

  • Premium health, dental, vision coverage with low employee premiums
  • 401(k) with generous company match
  • Unlimited PTO and flexible work policies
  • Parental leave (16+ weeks) and fertility benefits
  • Professional growth stipend ($3K+/year)
  • Wellness programs including mental health support
  • Daily catered meals and fully stocked kitchens
  • Commuter benefits and gym memberships
  • Employee resource groups and volunteer opportunities

This package reflects OpenAI's commitment to attracting world-class talent building safe AGI.

Why Join OpenAI?

OpenAI isn't just another tech company—it's the leading force in artificial general intelligence research and deployment. Our models power applications used by millions daily, from ChatGPT to DALL·E and beyond. Joining the Inference team means:

  • Working with the smartest researchers and engineers in AI
  • Access to unlimited compute resources and latest hardware
  • Direct impact on products serving 100M+ weekly users
  • Equity in a company valued at $150B+ with explosive growth
  • Culture emphasizing safety, rapid iteration, and mission alignment
  • San Francisco location in vibrant SoMa tech hub

We're building AI that benefits humanity, and your expertise in multimodal inference will accelerate that mission.

How to Apply

Ready to power the next era of multimodal AI? Submit your resume and a brief note on your most impactful inference project. Our hiring process includes:

  1. 30-minute recruiter screen
  2. Technical deep-dive with engineering team
  3. Systems design interview focused on ML infra
  4. Live coding challenge with multimodal scenarios
  5. Final interviews with research and leadership

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Apply today to join the AI revolution!

This job posting was analyzed and optimized for SEO to help candidates discover Software Engineer Inference opportunities at OpenAI. Keywords include multimodal model serving, GPU inference optimization, LLM deployment San Francisco, and OpenAI careers.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Multimodal Model Inferenceintermediate
  • GPU Optimizationintermediate
  • Tensor Parallelismintermediate
  • vLLMintermediate
  • TensorRT-LLMintermediate
  • Distributed Computingintermediate
  • Low-Latency Systemsintermediate
  • High-Throughput Inferenceintermediate
  • Image Generation Modelsintermediate
  • Audio Synthesisintermediate
  • ML Inference Infrastructureintermediate
  • CUDA Programmingintermediate
  • Model Servingintermediate
  • Real-Time AI Processingintermediate
  • Hardware Abstractionintermediate
  • LLM Deploymentintermediate
  • PyTorchintermediate
  • JAXintermediate
  • Kubernetesintermediate
  • Dockerintermediate

Required Qualifications

  • Experience building and scaling inference systems for LLMs or multimodal models (experience)
  • Hands-on work with GPU-based ML workloads and performance dynamics of large models (experience)
  • Deep understanding of complex data handling for images, audio, and non-text modalities (experience)
  • Familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems (experience)
  • Comfortable with systems spanning networking, distributed compute, and high-throughput data (experience)
  • Proven ability to own problems end-to-end in ambiguous, fast-moving environments (experience)
  • Experience collaborating closely with research teams on experimental ML workflows (experience)
  • Strong skills in optimizing for high-throughput and low-latency delivery (experience)
  • Knowledge of GPU utilization, tensor parallelism, and hardware abstraction layers (experience)
  • Background in transitioning experimental research into reliable production services (experience)
  • Experience with real-time audio and image processing pipelines (experience)
  • Proficiency in Python, C++, or similar for performance-critical systems (experience)

Responsibilities

  • Design and implement scalable inference infrastructure for multimodal AI models
  • Optimize systems for high-throughput, low-latency image and audio processing
  • Build reliable production services for real-time multimodal workloads
  • Collaborate with researchers to deploy next-generation multimodal models
  • Develop GPU optimization strategies including tensor parallelism techniques
  • Create hardware abstraction layers for diverse ML inference hardware
  • Enable seamless transition of experimental research workflows to production
  • Partner with product teams to define new interaction modalities beyond text
  • Implement high-performance data pipelines for heterogeneous model inputs/outputs
  • Monitor and improve system reliability, availability, and scalability in production
  • Contribute to developer tools and APIs for multimodal model serving
  • Troubleshoot and resolve performance bottlenecks in live inference systems
  • Work cross-functionally with infra, research, and product engineering teams

Benefits

  • general: Competitive salary with equity in a high-growth AI company
  • general: Comprehensive health, dental, and vision insurance coverage
  • general: 401(k) matching and retirement savings plans
  • general: Unlimited PTO and flexible vacation policy
  • general: Generous parental leave and family planning benefits
  • general: Remote-friendly work environment with SF headquarters
  • general: Professional development stipend for conferences and courses
  • general: Mental health support and wellness programs
  • general: Free lunches, snacks, and meals at the office
  • general: Commuter benefits and transportation allowances
  • general: Employee stock purchase plan opportunities
  • general: Volunteer time off and charitable giving matching
  • general: Cutting-edge hardware including latest GPUs for personal projects

Target Your Resume for "Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

software engineer inference openaimultimodal model serving jobsgpu inference engineer san franciscoopenai careers software engineerllm inference optimizationvllm tensorrt llm jobsai inference infrastructure rolesimage generation model deploymentaudio ai production engineeropenai san francisco jobsdistributed ml inferencelow latency ai servinggpu tensor parallelism jobsmultimodal ai engineeropenai inference team hiringreal time ai processing careersml systems engineer openaifrontier ai model deploymentsan francisco ai jobs 2024high performance ml infraopenai software engineer salarywhisper gpt4o inference rolesScaling

Answer 10 quick questions to check your fit for Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Inference - Multi Modal at OpenAI: Join the Future of AI in San Francisco

Are you passionate about pushing the boundaries of artificial intelligence? OpenAI is hiring a Software Engineer, Inference - Multi Modal to build the infrastructure powering our most advanced multimodal models. Located in the heart of San Francisco, California, this role offers a unique opportunity to work on cutting-edge AI systems that handle images, audio, and beyond. If you're experienced in LLM inference, GPU optimization, and scalable ML systems, apply now to shape the next generation of AI deployment.

Role Overview

OpenAI's Inference team is at the forefront of deploying our flagship models like GPT series, 4o Image Generation, and Whisper across diverse platforms. As a Software Engineer on the Multi Modal Inference team, you'll develop high-performance infrastructure for serving real-time audio, image, and multimodal workloads at massive scale. This small, fast-moving team collaborates directly with world-class researchers and product teams to bring experimental AI capabilities into production.

Multimodal inference represents the future of AI interaction—think generating speech from text, understanding complex images, and enabling natural human-AI conversations that transcend text-only interfaces. Your work will ensure these models are reliable, performant, and accessible to millions of developers worldwide. Based in San Francisco, you'll thrive in an environment that values innovation, rapid iteration, and cross-functional collaboration.

This isn't just engineering; it's pioneering the infrastructure that makes advanced AI ubiquitous. With OpenAI's mission to benefit all of humanity, your contributions will have global impact.

Key Responsibilities

In this high-impact role, you'll tackle challenging problems at the intersection of ML systems, distributed computing, and real-time performance. Key responsibilities include:

  • Designing and implementing inference infrastructure optimized for large-scale multimodal models handling image, audio, and text inputs.
  • Optimizing end-to-end systems for ultra-low latency and high-throughput delivery of complex multimodal outputs.
  • Building tools and workflows that seamlessly transition experimental research prototypes into robust production services.
  • Collaborating intimately with AI researchers training frontier models and product engineers defining novel user interactions.
  • Driving system-level improvements in GPU utilization, tensor parallelism, pipeline parallelism, and custom hardware abstractions.
  • Developing scalable data pipelines for heterogeneous model architectures and diverse input/output formats.
  • Implementing monitoring, alerting, and auto-scaling systems for production inference clusters.
  • Contributing to open-source inference tooling and developer platforms that power the AI ecosystem.
  • Troubleshooting live production issues across distributed GPU clusters under heavy load.
  • Partnering with infrastructure teams to deploy on cutting-edge hardware like H100s, TPUs, and custom accelerators.
  • Creating APIs and SDKs that enable developers to easily integrate multimodal capabilities into applications.
  • Staying ahead of evolving research by incorporating new techniques like speculative decoding and mixture-of-experts routing.
  • Conducting performance profiling and benchmarking across diverse model sizes and workloads.

Expect to own projects end-to-end, from initial design through production deployment and ongoing optimization.

Qualifications

We're seeking engineers who excel in ambiguous, fast-paced environments and have a track record of shipping production ML systems. You might thrive if you:

  • Have 5+ years building and scaling inference systems for LLMs, vision models, or multimodal AI.
  • Deep expertise with GPU-based ML workloads, understanding memory bandwidth, compute utilization, and KV cache dynamics.
  • Experience optimizing complex data pipelines for images, audio spectrograms, and tokenized multimodal inputs.
  • Familiarity with state-of-the-art inference frameworks (vLLM, TensorRT-LLM, SGLang, custom serving stacks).
  • Strong systems programming skills in Python/C++/CUDA for performance-critical components.
  • Proven ability to collaborate with research scientists on bleeding-edge model architectures.
  • Comfort owning distributed systems spanning networking, storage, and orchestration layers.
  • Experience with containerization (Docker/Kubernetes) and cloud infrastructure (AWS/GCP).
  • Bonus: Production experience with diffusion models, audio codecs, or real-time speech systems.

Salary & Benefits

OpenAI offers competitive compensation packages for Software Engineers in Inference roles, typically ranging from $250,000 to $450,000 base salary plus significant equity. Total compensation can exceed $700K+ for top performers, including bonuses and stock options in our rapidly growing company.

Comprehensive benefits include:

  • Premium health, dental, vision coverage with low employee premiums
  • 401(k) with generous company match
  • Unlimited PTO and flexible work policies
  • Parental leave (16+ weeks) and fertility benefits
  • Professional growth stipend ($3K+/year)
  • Wellness programs including mental health support
  • Daily catered meals and fully stocked kitchens
  • Commuter benefits and gym memberships
  • Employee resource groups and volunteer opportunities

This package reflects OpenAI's commitment to attracting world-class talent building safe AGI.

Why Join OpenAI?

OpenAI isn't just another tech company—it's the leading force in artificial general intelligence research and deployment. Our models power applications used by millions daily, from ChatGPT to DALL·E and beyond. Joining the Inference team means:

  • Working with the smartest researchers and engineers in AI
  • Access to unlimited compute resources and latest hardware
  • Direct impact on products serving 100M+ weekly users
  • Equity in a company valued at $150B+ with explosive growth
  • Culture emphasizing safety, rapid iteration, and mission alignment
  • San Francisco location in vibrant SoMa tech hub

We're building AI that benefits humanity, and your expertise in multimodal inference will accelerate that mission.

How to Apply

Ready to power the next era of multimodal AI? Submit your resume and a brief note on your most impactful inference project. Our hiring process includes:

  1. 30-minute recruiter screen
  2. Technical deep-dive with engineering team
  3. Systems design interview focused on ML infra
  4. Live coding challenge with multimodal scenarios
  5. Final interviews with research and leadership

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Apply today to join the AI revolution!

This job posting was analyzed and optimized for SEO to help candidates discover Software Engineer Inference opportunities at OpenAI. Keywords include multimodal model serving, GPU inference optimization, LLM deployment San Francisco, and OpenAI careers.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Multimodal Model Inferenceintermediate
  • GPU Optimizationintermediate
  • Tensor Parallelismintermediate
  • vLLMintermediate
  • TensorRT-LLMintermediate
  • Distributed Computingintermediate
  • Low-Latency Systemsintermediate
  • High-Throughput Inferenceintermediate
  • Image Generation Modelsintermediate
  • Audio Synthesisintermediate
  • ML Inference Infrastructureintermediate
  • CUDA Programmingintermediate
  • Model Servingintermediate
  • Real-Time AI Processingintermediate
  • Hardware Abstractionintermediate
  • LLM Deploymentintermediate
  • PyTorchintermediate
  • JAXintermediate
  • Kubernetesintermediate
  • Dockerintermediate

Required Qualifications

  • Experience building and scaling inference systems for LLMs or multimodal models (experience)
  • Hands-on work with GPU-based ML workloads and performance dynamics of large models (experience)
  • Deep understanding of complex data handling for images, audio, and non-text modalities (experience)
  • Familiarity with inference tooling like vLLM, TensorRT-LLM, or custom model parallel systems (experience)
  • Comfortable with systems spanning networking, distributed compute, and high-throughput data (experience)
  • Proven ability to own problems end-to-end in ambiguous, fast-moving environments (experience)
  • Experience collaborating closely with research teams on experimental ML workflows (experience)
  • Strong skills in optimizing for high-throughput and low-latency delivery (experience)
  • Knowledge of GPU utilization, tensor parallelism, and hardware abstraction layers (experience)
  • Background in transitioning experimental research into reliable production services (experience)
  • Experience with real-time audio and image processing pipelines (experience)
  • Proficiency in Python, C++, or similar for performance-critical systems (experience)

Responsibilities

  • Design and implement scalable inference infrastructure for multimodal AI models
  • Optimize systems for high-throughput, low-latency image and audio processing
  • Build reliable production services for real-time multimodal workloads
  • Collaborate with researchers to deploy next-generation multimodal models
  • Develop GPU optimization strategies including tensor parallelism techniques
  • Create hardware abstraction layers for diverse ML inference hardware
  • Enable seamless transition of experimental research workflows to production
  • Partner with product teams to define new interaction modalities beyond text
  • Implement high-performance data pipelines for heterogeneous model inputs/outputs
  • Monitor and improve system reliability, availability, and scalability in production
  • Contribute to developer tools and APIs for multimodal model serving
  • Troubleshoot and resolve performance bottlenecks in live inference systems
  • Work cross-functionally with infra, research, and product engineering teams

Benefits

  • general: Competitive salary with equity in a high-growth AI company
  • general: Comprehensive health, dental, and vision insurance coverage
  • general: 401(k) matching and retirement savings plans
  • general: Unlimited PTO and flexible vacation policy
  • general: Generous parental leave and family planning benefits
  • general: Remote-friendly work environment with SF headquarters
  • general: Professional development stipend for conferences and courses
  • general: Mental health support and wellness programs
  • general: Free lunches, snacks, and meals at the office
  • general: Commuter benefits and transportation allowances
  • general: Employee stock purchase plan opportunities
  • general: Volunteer time off and charitable giving matching
  • general: Cutting-edge hardware including latest GPUs for personal projects

Target Your Resume for "Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

software engineer inference openaimultimodal model serving jobsgpu inference engineer san franciscoopenai careers software engineerllm inference optimizationvllm tensorrt llm jobsai inference infrastructure rolesimage generation model deploymentaudio ai production engineeropenai san francisco jobsdistributed ml inferencelow latency ai servinggpu tensor parallelism jobsmultimodal ai engineeropenai inference team hiringreal time ai processing careersml systems engineer openaifrontier ai model deploymentsan francisco ai jobs 2024high performance ml infraopenai software engineer salarywhisper gpt4o inference rolesScaling

Answer 10 quick questions to check your fit for Software Engineer, Inference - Multi Modal Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.