Resume and JobRESUME AND JOB
OpenAI logo

Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Fleet Infrastructure at OpenAI - San Francisco, CA

Join OpenAI's mission to ensure AGI benefits all of humanity by building the world's largest GPU fleet infrastructure. This senior-level Software Engineer role focuses on creating reliable, scalable systems that power cutting-edge AI model training and deployment.

Role Overview

The Fleet Infrastructure team at OpenAI operates one of the world's most advanced GPU computing platforms. As a Software Engineer on this team, you'll design, build, and maintain systems that maximize GPU utilization while supporting complex AI research workflows. From job scheduling and Kubernetes automation to high-performance snapshot delivery, your work directly enables OpenAI's breakthroughs in artificial general intelligence.

This is not just infrastructure engineering—it's mission-critical systems engineering at unprecedented scale. You'll work with massive GPU clusters running the latest AI models, collaborating with world-class researchers and hardware experts. The role demands both deep technical expertise and the ability to deliver reliable systems under tight timelines in a fast-moving organization.

Based in San Francisco with a hybrid model (3 days/week in office), this position offers relocation assistance and the chance to shape infrastructure that powers the future of AI.

Key Responsibilities

Your day-to-day will span the full stack of fleet infrastructure engineering:

  • Design and implement advanced job scheduling systems that maximize GPU utilization across thousands of nodes
  • Build push-button automation for Kubernetes cluster provisioning, scaling, and upgrades
  • Develop user-friendly quota management interfaces for research teams
  • Optimize high-performance snapshot delivery systems from blob storage to hardware caching
  • Create robust CI/CD pipelines for infrastructure deployments
  • Interface directly with AI researchers to understand and support complex training workloads
  • Collaborate with hardware engineers to optimize GPU fleet performance
  • Maintain 99.99%+ reliability across hyperscale infrastructure
  • Build service frameworks that streamline model deployment workflows
  • Develop monitoring, alerting, and observability systems for fleet health
  • Automate maintenance procedures to minimize operational overhead
  • Troubleshoot production issues across distributed GPU clusters
  • Work cross-functionally with product, business, and infrastructure teams

Qualifications

We're looking for engineers who excel in hyperscale environments and thrive on complex technical challenges:

  • 5+ years experience building and operating large-scale compute infrastructure
  • Expertise in Kubernetes at massive scale (1000+ node clusters)
  • Strong programming skills (Python, Go, C++ preferred)
  • Deep experience with public cloud platforms, especially Microsoft Azure
  • Proven success building distributed job scheduling systems
  • Experience with infrastructure-as-code and GitOps practices
  • Familiarity with AI/ML workloads and training infrastructure
  • Execution-focused mindset with rigorous attention to user needs
  • Ability to collaborate effectively across engineering, research, and business teams
  • Comfort working in fast-paced environments with tight timelines

Bonus Points: Experience with GPU-optimized networking, RDMA, InfiniBand, or NCCL; prior work on AI training infrastructure; contributions to open-source infrastructure projects.

Salary & Benefits

Compensation Range: $250,000 - $450,000 base salary + equity + benefits (Total compensation depends on experience and location)

OpenAI offers one of the most competitive compensation packages in tech, including:

  • Industry-leading base salary and equity ownership
  • Comprehensive health benefits (medical, dental, vision)
  • Hybrid work model with 3 days/week in our San Francisco office
  • Full relocation assistance including housing support
  • Generous parental leave (16 weeks fully paid)
  • Unlimited vacation with manager approval
  • Mental health benefits and employee assistance programs
  • Professional development budget ($3,000/year)
  • Fitness reimbursements and gym memberships
  • Daily catered meals and fully stocked kitchens
  • Commuter benefits and subsidized parking
  • 401(k) with generous company match

Why Join OpenAI?

OpenAI isn't just building AI—we're building the infrastructure that makes AGI possible. The Fleet Infrastructure team sits at the heart of our technical organization, powering every major model release and research breakthrough.

Impact at Scale: Your systems will run the world's largest GPU fleet, training models that push the boundaries of human knowledge.

Cutting-Edge Challenges: Work on problems no one else has solved at this scale—massive GPU orchestration, exabyte-scale data movement, sub-second model startup times.

World-Class Team: Collaborate with PhD researchers, hardware experts, and infrastructure engineers who wrote the book on distributed systems.

Mission-Driven Culture: We're dedicated to safe AGI development that benefits all humanity. Your work directly advances this mission.

San Francisco HQ: Join our vibrant office in the heart of the world's AI capital, with easy access to top talent and research institutions.

How to Apply

Ready to build the infrastructure powering the next generation of AI? Here's what to expect:

  1. Submit Application: Upload your resume and a brief note about why you're excited about fleet infrastructure at OpenAI
  2. Technical Screen: 45-minute conversation about your experience with distributed systems
  3. Technical Deep Dive: Live coding and system design interviews focused on GPU infrastructure challenges
  4. Team Interviews: Meet your future teammates and discuss real fleet engineering problems
  5. Offer: Competitive compensation package tailored to your experience

Timeline: Most candidates hear back within 1 week. Full process takes 2-4 weeks.

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Kubernetesintermediate
  • GPU Cluster Managementintermediate
  • Job Scheduling Systemsintermediate
  • Azure Cloud Infrastructureintermediate
  • CI/CD Pipelinesintermediate
  • Distributed Systemsintermediate
  • Python Programmingintermediate
  • Go Programmingintermediate
  • Infrastructure as Codeintermediate
  • Container Orchestrationintermediate
  • High-Performance Computingintermediate
  • Blob Storage Optimizationintermediate
  • Cluster Provisioningintermediate
  • AI/ML Workloadsintermediate
  • Service Frameworksintermediate
  • Automation Engineeringintermediate
  • Research Workflow Supportintermediate
  • Snapshot Delivery Systemsintermediate

Required Qualifications

  • 5+ years experience with hyperscale compute systems (experience)
  • Strong programming skills in Python, Go, or C++ (experience)
  • Hands-on experience working in public clouds, especially Azure (experience)
  • Deep expertise in Kubernetes cluster management and orchestration (experience)
  • Proven track record building job scheduling and quota systems (experience)
  • Experience with push-button automation for cluster provisioning (experience)
  • Familiarity with high-performance snapshot delivery systems (experience)
  • Execution-focused mentality with rigorous user requirements focus (experience)
  • Understanding of AI/ML training and deployment workloads (experience)
  • Experience interfacing with research and product teams (experience)
  • Strong collaboration skills with hardware and infrastructure teams (experience)
  • Ability to thrive in fast-paced, high-stakes environments (experience)

Responsibilities

  • Design and implement job scheduling systems for GPU workloads
  • Build and operate Kubernetes cluster provisioning automation
  • Develop user-friendly quota management systems for researchers
  • Optimize snapshot delivery for fast model startup times
  • Create CI/CD pipelines for infrastructure deployments
  • Interface with researchers to understand AI training requirements
  • Collaborate with hardware teams on GPU fleet optimization
  • Maintain high utilization rates across massive GPU clusters
  • Build service frameworks supporting research workflows
  • Develop monitoring and alerting systems for fleet reliability
  • Automate cluster upgrades and maintenance procedures
  • Ensure low-maintenance platform operations at scale
  • Work cross-functionally with product and business teams
  • Troubleshoot and resolve production issues in real-time

Benefits

  • general: Competitive salary with equity package
  • general: Comprehensive medical, dental, and vision insurance
  • general: Hybrid work model (3 days in office per week)
  • general: Relocation assistance for new employees
  • general: Generous parental leave policy
  • general: Unlimited PTO with encouragement to disconnect
  • general: Mental health and wellness benefits
  • general: Professional development stipend
  • general: Gym membership and fitness reimbursements
  • general: Catered meals and snacks in office
  • general: Commuter benefits and parking
  • general: 401(k) matching program
  • general: Employee referral bonuses
  • general: Volunteer time off program

Target Your Resume for "Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

OpenAI software engineer jobsfleet infrastructure engineer OpenAIGPU cluster engineer San FranciscoKubernetes engineer OpenAI careersAzure infrastructure jobs AIdistributed systems engineer OpenAIAI training infrastructure careerssoftware engineer GPU fleetOpenAI San Francisco engineering jobshyperscale compute engineerjob scheduling systems engineerKubernetes cluster automation jobsAI research infrastructure engineerOpenAI fleet team careersGPU orchestration engineerSan Francisco AI infrastructure jobsOpenAI engineering salarysenior software engineer OpenAIcloud infrastructure AI jobsmodel deployment engineer OpenAIresearch workflow infrastructureScaling

Answer 10 quick questions to check your fit for Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Fleet Infrastructure at OpenAI - San Francisco, CA

Join OpenAI's mission to ensure AGI benefits all of humanity by building the world's largest GPU fleet infrastructure. This senior-level Software Engineer role focuses on creating reliable, scalable systems that power cutting-edge AI model training and deployment.

Role Overview

The Fleet Infrastructure team at OpenAI operates one of the world's most advanced GPU computing platforms. As a Software Engineer on this team, you'll design, build, and maintain systems that maximize GPU utilization while supporting complex AI research workflows. From job scheduling and Kubernetes automation to high-performance snapshot delivery, your work directly enables OpenAI's breakthroughs in artificial general intelligence.

This is not just infrastructure engineering—it's mission-critical systems engineering at unprecedented scale. You'll work with massive GPU clusters running the latest AI models, collaborating with world-class researchers and hardware experts. The role demands both deep technical expertise and the ability to deliver reliable systems under tight timelines in a fast-moving organization.

Based in San Francisco with a hybrid model (3 days/week in office), this position offers relocation assistance and the chance to shape infrastructure that powers the future of AI.

Key Responsibilities

Your day-to-day will span the full stack of fleet infrastructure engineering:

  • Design and implement advanced job scheduling systems that maximize GPU utilization across thousands of nodes
  • Build push-button automation for Kubernetes cluster provisioning, scaling, and upgrades
  • Develop user-friendly quota management interfaces for research teams
  • Optimize high-performance snapshot delivery systems from blob storage to hardware caching
  • Create robust CI/CD pipelines for infrastructure deployments
  • Interface directly with AI researchers to understand and support complex training workloads
  • Collaborate with hardware engineers to optimize GPU fleet performance
  • Maintain 99.99%+ reliability across hyperscale infrastructure
  • Build service frameworks that streamline model deployment workflows
  • Develop monitoring, alerting, and observability systems for fleet health
  • Automate maintenance procedures to minimize operational overhead
  • Troubleshoot production issues across distributed GPU clusters
  • Work cross-functionally with product, business, and infrastructure teams

Qualifications

We're looking for engineers who excel in hyperscale environments and thrive on complex technical challenges:

  • 5+ years experience building and operating large-scale compute infrastructure
  • Expertise in Kubernetes at massive scale (1000+ node clusters)
  • Strong programming skills (Python, Go, C++ preferred)
  • Deep experience with public cloud platforms, especially Microsoft Azure
  • Proven success building distributed job scheduling systems
  • Experience with infrastructure-as-code and GitOps practices
  • Familiarity with AI/ML workloads and training infrastructure
  • Execution-focused mindset with rigorous attention to user needs
  • Ability to collaborate effectively across engineering, research, and business teams
  • Comfort working in fast-paced environments with tight timelines

Bonus Points: Experience with GPU-optimized networking, RDMA, InfiniBand, or NCCL; prior work on AI training infrastructure; contributions to open-source infrastructure projects.

Salary & Benefits

Compensation Range: $250,000 - $450,000 base salary + equity + benefits (Total compensation depends on experience and location)

OpenAI offers one of the most competitive compensation packages in tech, including:

  • Industry-leading base salary and equity ownership
  • Comprehensive health benefits (medical, dental, vision)
  • Hybrid work model with 3 days/week in our San Francisco office
  • Full relocation assistance including housing support
  • Generous parental leave (16 weeks fully paid)
  • Unlimited vacation with manager approval
  • Mental health benefits and employee assistance programs
  • Professional development budget ($3,000/year)
  • Fitness reimbursements and gym memberships
  • Daily catered meals and fully stocked kitchens
  • Commuter benefits and subsidized parking
  • 401(k) with generous company match

Why Join OpenAI?

OpenAI isn't just building AI—we're building the infrastructure that makes AGI possible. The Fleet Infrastructure team sits at the heart of our technical organization, powering every major model release and research breakthrough.

Impact at Scale: Your systems will run the world's largest GPU fleet, training models that push the boundaries of human knowledge.

Cutting-Edge Challenges: Work on problems no one else has solved at this scale—massive GPU orchestration, exabyte-scale data movement, sub-second model startup times.

World-Class Team: Collaborate with PhD researchers, hardware experts, and infrastructure engineers who wrote the book on distributed systems.

Mission-Driven Culture: We're dedicated to safe AGI development that benefits all humanity. Your work directly advances this mission.

San Francisco HQ: Join our vibrant office in the heart of the world's AI capital, with easy access to top talent and research institutions.

How to Apply

Ready to build the infrastructure powering the next generation of AI? Here's what to expect:

  1. Submit Application: Upload your resume and a brief note about why you're excited about fleet infrastructure at OpenAI
  2. Technical Screen: 45-minute conversation about your experience with distributed systems
  3. Technical Deep Dive: Live coding and system design interviews focused on GPU infrastructure challenges
  4. Team Interviews: Meet your future teammates and discuss real fleet engineering problems
  5. Offer: Competitive compensation package tailored to your experience

Timeline: Most candidates hear back within 1 week. Full process takes 2-4 weeks.

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Kubernetesintermediate
  • GPU Cluster Managementintermediate
  • Job Scheduling Systemsintermediate
  • Azure Cloud Infrastructureintermediate
  • CI/CD Pipelinesintermediate
  • Distributed Systemsintermediate
  • Python Programmingintermediate
  • Go Programmingintermediate
  • Infrastructure as Codeintermediate
  • Container Orchestrationintermediate
  • High-Performance Computingintermediate
  • Blob Storage Optimizationintermediate
  • Cluster Provisioningintermediate
  • AI/ML Workloadsintermediate
  • Service Frameworksintermediate
  • Automation Engineeringintermediate
  • Research Workflow Supportintermediate
  • Snapshot Delivery Systemsintermediate

Required Qualifications

  • 5+ years experience with hyperscale compute systems (experience)
  • Strong programming skills in Python, Go, or C++ (experience)
  • Hands-on experience working in public clouds, especially Azure (experience)
  • Deep expertise in Kubernetes cluster management and orchestration (experience)
  • Proven track record building job scheduling and quota systems (experience)
  • Experience with push-button automation for cluster provisioning (experience)
  • Familiarity with high-performance snapshot delivery systems (experience)
  • Execution-focused mentality with rigorous user requirements focus (experience)
  • Understanding of AI/ML training and deployment workloads (experience)
  • Experience interfacing with research and product teams (experience)
  • Strong collaboration skills with hardware and infrastructure teams (experience)
  • Ability to thrive in fast-paced, high-stakes environments (experience)

Responsibilities

  • Design and implement job scheduling systems for GPU workloads
  • Build and operate Kubernetes cluster provisioning automation
  • Develop user-friendly quota management systems for researchers
  • Optimize snapshot delivery for fast model startup times
  • Create CI/CD pipelines for infrastructure deployments
  • Interface with researchers to understand AI training requirements
  • Collaborate with hardware teams on GPU fleet optimization
  • Maintain high utilization rates across massive GPU clusters
  • Build service frameworks supporting research workflows
  • Develop monitoring and alerting systems for fleet reliability
  • Automate cluster upgrades and maintenance procedures
  • Ensure low-maintenance platform operations at scale
  • Work cross-functionally with product and business teams
  • Troubleshoot and resolve production issues in real-time

Benefits

  • general: Competitive salary with equity package
  • general: Comprehensive medical, dental, and vision insurance
  • general: Hybrid work model (3 days in office per week)
  • general: Relocation assistance for new employees
  • general: Generous parental leave policy
  • general: Unlimited PTO with encouragement to disconnect
  • general: Mental health and wellness benefits
  • general: Professional development stipend
  • general: Gym membership and fitness reimbursements
  • general: Catered meals and snacks in office
  • general: Commuter benefits and parking
  • general: 401(k) matching program
  • general: Employee referral bonuses
  • general: Volunteer time off program

Target Your Resume for "Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

OpenAI software engineer jobsfleet infrastructure engineer OpenAIGPU cluster engineer San FranciscoKubernetes engineer OpenAI careersAzure infrastructure jobs AIdistributed systems engineer OpenAIAI training infrastructure careerssoftware engineer GPU fleetOpenAI San Francisco engineering jobshyperscale compute engineerjob scheduling systems engineerKubernetes cluster automation jobsAI research infrastructure engineerOpenAI fleet team careersGPU orchestration engineerSan Francisco AI infrastructure jobsOpenAI engineering salarysenior software engineer OpenAIcloud infrastructure AI jobsmodel deployment engineer OpenAIresearch workflow infrastructureScaling

Answer 10 quick questions to check your fit for Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.