RESUME AND JOB

Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Fleet Infrastructure at OpenAI - San Francisco, CA

Join OpenAI's mission to ensure AGI benefits all of humanity by building the world's largest GPU fleet infrastructure. This senior-level Software Engineer role focuses on creating reliable, scalable systems that power cutting-edge AI model training and deployment.

Role Overview

The Fleet Infrastructure team at OpenAI operates one of the world's most advanced GPU computing platforms. As a Software Engineer on this team, you'll design, build, and maintain systems that maximize GPU utilization while supporting complex AI research workflows. From job scheduling and Kubernetes automation to high-performance snapshot delivery, your work directly enables OpenAI's breakthroughs in artificial general intelligence.

This is not just infrastructure engineering—it's mission-critical systems engineering at unprecedented scale. You'll work with massive GPU clusters running the latest AI models, collaborating with world-class researchers and hardware experts. The role demands both deep technical expertise and the ability to deliver reliable systems under tight timelines in a fast-moving organization.

Based in San Francisco with a hybrid model (3 days/week in office), this position offers relocation assistance and the chance to shape infrastructure that powers the future of AI.

Key Responsibilities

Your day-to-day will span the full stack of fleet infrastructure engineering:

Design and implement advanced job scheduling systems that maximize GPU utilization across thousands of nodes
Build push-button automation for Kubernetes cluster provisioning, scaling, and upgrades
Develop user-friendly quota management interfaces for research teams
Optimize high-performance snapshot delivery systems from blob storage to hardware caching
Create robust CI/CD pipelines for infrastructure deployments
Interface directly with AI researchers to understand and support complex training workloads
Collaborate with hardware engineers to optimize GPU fleet performance
Maintain 99.99%+ reliability across hyperscale infrastructure
Build service frameworks that streamline model deployment workflows
Develop monitoring, alerting, and observability systems for fleet health
Automate maintenance procedures to minimize operational overhead
Troubleshoot production issues across distributed GPU clusters
Work cross-functionally with product, business, and infrastructure teams

Qualifications

We're looking for engineers who excel in hyperscale environments and thrive on complex technical challenges:

5+ years experience building and operating large-scale compute infrastructure
Expertise in Kubernetes at massive scale (1000+ node clusters)
Strong programming skills (Python, Go, C++ preferred)
Deep experience with public cloud platforms, especially Microsoft Azure
Proven success building distributed job scheduling systems
Experience with infrastructure-as-code and GitOps practices
Familiarity with AI/ML workloads and training infrastructure
Execution-focused mindset with rigorous attention to user needs
Ability to collaborate effectively across engineering, research, and business teams
Comfort working in fast-paced environments with tight timelines

Bonus Points: Experience with GPU-optimized networking, RDMA, InfiniBand, or NCCL; prior work on AI training infrastructure; contributions to open-source infrastructure projects.

Salary & Benefits

Compensation Range: $250,000 - $450,000 base salary + equity + benefits (Total compensation depends on experience and location)

OpenAI offers one of the most competitive compensation packages in tech, including:

Industry-leading base salary and equity ownership
Comprehensive health benefits (medical, dental, vision)
Hybrid work model with 3 days/week in our San Francisco office
Full relocation assistance including housing support
Generous parental leave (16 weeks fully paid)
Unlimited vacation with manager approval
Mental health benefits and employee assistance programs
Professional development budget ($3,000/year)
Fitness reimbursements and gym memberships
Daily catered meals and fully stocked kitchens
Commuter benefits and subsidized parking
401(k) with generous company match

Why Join OpenAI?

OpenAI isn't just building AI—we're building the infrastructure that makes AGI possible. The Fleet Infrastructure team sits at the heart of our technical organization, powering every major model release and research breakthrough.

Impact at Scale: Your systems will run the world's largest GPU fleet, training models that push the boundaries of human knowledge.

Cutting-Edge Challenges: Work on problems no one else has solved at this scale—massive GPU orchestration, exabyte-scale data movement, sub-second model startup times.

World-Class Team: Collaborate with PhD researchers, hardware experts, and infrastructure engineers who wrote the book on distributed systems.

Mission-Driven Culture: We're dedicated to safe AGI development that benefits all humanity. Your work directly advances this mission.

San Francisco HQ: Join our vibrant office in the heart of the world's AI capital, with easy access to top talent and research institutions.

How to Apply

Ready to build the infrastructure powering the next generation of AI? Here's what to expect:

Submit Application: Upload your resume and a brief note about why you're excited about fleet infrastructure at OpenAI
Technical Screen: 45-minute conversation about your experience with distributed systems
Technical Deep Dive: Live coding and system design interviews focused on GPU infrastructure challenges
Team Interviews: Meet your future teammates and discuss real fleet engineering problems
Offer: Competitive compensation package tailored to your experience

Timeline: Most candidates hear back within 1 week. Full process takes 2-4 weeks.

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Kubernetesintermediate
GPU Cluster Managementintermediate
Job Scheduling Systemsintermediate
Azure Cloud Infrastructureintermediate
CI/CD Pipelinesintermediate
Distributed Systemsintermediate
Python Programmingintermediate
Go Programmingintermediate
Infrastructure as Codeintermediate
Container Orchestrationintermediate
High-Performance Computingintermediate
Blob Storage Optimizationintermediate
Cluster Provisioningintermediate
AI/ML Workloadsintermediate
Service Frameworksintermediate
Automation Engineeringintermediate
Research Workflow Supportintermediate
Snapshot Delivery Systemsintermediate

Required Qualifications

5+ years experience with hyperscale compute systems (experience)
Strong programming skills in Python, Go, or C++ (experience)
Hands-on experience working in public clouds, especially Azure (experience)
Deep expertise in Kubernetes cluster management and orchestration (experience)
Proven track record building job scheduling and quota systems (experience)
Experience with push-button automation for cluster provisioning (experience)
Familiarity with high-performance snapshot delivery systems (experience)
Execution-focused mentality with rigorous user requirements focus (experience)
Understanding of AI/ML training and deployment workloads (experience)
Experience interfacing with research and product teams (experience)
Strong collaboration skills with hardware and infrastructure teams (experience)
Ability to thrive in fast-paced, high-stakes environments (experience)

Responsibilities

Design and implement job scheduling systems for GPU workloads
Build and operate Kubernetes cluster provisioning automation
Develop user-friendly quota management systems for researchers
Optimize snapshot delivery for fast model startup times
Create CI/CD pipelines for infrastructure deployments
Interface with researchers to understand AI training requirements
Collaborate with hardware teams on GPU fleet optimization
Maintain high utilization rates across massive GPU clusters
Build service frameworks supporting research workflows
Develop monitoring and alerting systems for fleet reliability
Automate cluster upgrades and maintenance procedures
Ensure low-maintenance platform operations at scale
Work cross-functionally with product and business teams
Troubleshoot and resolve production issues in real-time

Benefits

general: Competitive salary with equity package
general: Comprehensive medical, dental, and vision insurance
general: Hybrid work model (3 days in office per week)
general: Relocation assistance for new employees
general: Generous parental leave policy
general: Unlimited PTO with encouragement to disconnect
general: Mental health and wellness benefits
general: Professional development stipend
general: Gym membership and fitness reimbursements
general: Catered meals and snacks in office
general: Commuter benefits and parking
general: 401(k) matching program
general: Employee referral bonuses
general: Volunteer time off program

Target Your Resume for "Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

OpenAI software engineer jobsfleet infrastructure engineer OpenAIGPU cluster engineer San FranciscoKubernetes engineer OpenAI careersAzure infrastructure jobs AIdistributed systems engineer OpenAIAI training infrastructure careerssoftware engineer GPU fleetOpenAI San Francisco engineering jobshyperscale compute engineerjob scheduling systems engineerKubernetes cluster automation jobsAI research infrastructure engineerOpenAI fleet team careersGPU orchestration engineerSan Francisco AI infrastructure jobsOpenAI engineering salarysenior software engineer OpenAIcloud infrastructure AI jobsmodel deployment engineer OpenAIresearch workflow infrastructureScaling

Answer 10 quick questions to check your fit for Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Fleet Infrastructure at OpenAI - San Francisco, CA

Role Overview

Based in San Francisco with a hybrid model (3 days/week in office), this position offers relocation assistance and the chance to shape infrastructure that powers the future of AI.

Key Responsibilities

Your day-to-day will span the full stack of fleet infrastructure engineering:

Design and implement advanced job scheduling systems that maximize GPU utilization across thousands of nodes
Build push-button automation for Kubernetes cluster provisioning, scaling, and upgrades
Develop user-friendly quota management interfaces for research teams
Optimize high-performance snapshot delivery systems from blob storage to hardware caching
Create robust CI/CD pipelines for infrastructure deployments
Interface directly with AI researchers to understand and support complex training workloads
Collaborate with hardware engineers to optimize GPU fleet performance
Maintain 99.99%+ reliability across hyperscale infrastructure
Build service frameworks that streamline model deployment workflows
Develop monitoring, alerting, and observability systems for fleet health
Automate maintenance procedures to minimize operational overhead
Troubleshoot production issues across distributed GPU clusters
Work cross-functionally with product, business, and infrastructure teams

Qualifications

We're looking for engineers who excel in hyperscale environments and thrive on complex technical challenges:

5+ years experience building and operating large-scale compute infrastructure
Expertise in Kubernetes at massive scale (1000+ node clusters)
Strong programming skills (Python, Go, C++ preferred)
Deep experience with public cloud platforms, especially Microsoft Azure
Proven success building distributed job scheduling systems
Experience with infrastructure-as-code and GitOps practices
Familiarity with AI/ML workloads and training infrastructure
Execution-focused mindset with rigorous attention to user needs
Ability to collaborate effectively across engineering, research, and business teams
Comfort working in fast-paced environments with tight timelines

Bonus Points: Experience with GPU-optimized networking, RDMA, InfiniBand, or NCCL; prior work on AI training infrastructure; contributions to open-source infrastructure projects.

Salary & Benefits

Compensation Range: $250,000 - $450,000 base salary + equity + benefits (Total compensation depends on experience and location)

OpenAI offers one of the most competitive compensation packages in tech, including:

Industry-leading base salary and equity ownership
Comprehensive health benefits (medical, dental, vision)
Hybrid work model with 3 days/week in our San Francisco office
Full relocation assistance including housing support
Generous parental leave (16 weeks fully paid)
Unlimited vacation with manager approval
Mental health benefits and employee assistance programs
Professional development budget ($3,000/year)
Fitness reimbursements and gym memberships
Daily catered meals and fully stocked kitchens
Commuter benefits and subsidized parking
401(k) with generous company match

Why Join OpenAI?

Impact at Scale: Your systems will run the world's largest GPU fleet, training models that push the boundaries of human knowledge.

Cutting-Edge Challenges: Work on problems no one else has solved at this scale—massive GPU orchestration, exabyte-scale data movement, sub-second model startup times.

World-Class Team: Collaborate with PhD researchers, hardware experts, and infrastructure engineers who wrote the book on distributed systems.

Mission-Driven Culture: We're dedicated to safe AGI development that benefits all humanity. Your work directly advances this mission.

San Francisco HQ: Join our vibrant office in the heart of the world's AI capital, with easy access to top talent and research institutions.

How to Apply

Ready to build the infrastructure powering the next generation of AI? Here's what to expect:

Submit Application: Upload your resume and a brief note about why you're excited about fleet infrastructure at OpenAI
Technical Screen: 45-minute conversation about your experience with distributed systems
Technical Deep Dive: Live coding and system design interviews focused on GPU infrastructure challenges
Team Interviews: Meet your future teammates and discuss real fleet engineering problems
Offer: Competitive compensation package tailored to your experience

Timeline: Most candidates hear back within 1 week. Full process takes 2-4 weeks.

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Kubernetesintermediate
GPU Cluster Managementintermediate
Job Scheduling Systemsintermediate
Azure Cloud Infrastructureintermediate
CI/CD Pipelinesintermediate
Distributed Systemsintermediate
Python Programmingintermediate
Go Programmingintermediate
Infrastructure as Codeintermediate
Container Orchestrationintermediate
High-Performance Computingintermediate
Blob Storage Optimizationintermediate
Cluster Provisioningintermediate
AI/ML Workloadsintermediate
Service Frameworksintermediate
Automation Engineeringintermediate
Research Workflow Supportintermediate
Snapshot Delivery Systemsintermediate

Required Qualifications

5+ years experience with hyperscale compute systems (experience)
Strong programming skills in Python, Go, or C++ (experience)
Hands-on experience working in public clouds, especially Azure (experience)
Deep expertise in Kubernetes cluster management and orchestration (experience)
Proven track record building job scheduling and quota systems (experience)
Experience with push-button automation for cluster provisioning (experience)
Familiarity with high-performance snapshot delivery systems (experience)
Execution-focused mentality with rigorous user requirements focus (experience)
Understanding of AI/ML training and deployment workloads (experience)
Experience interfacing with research and product teams (experience)
Strong collaboration skills with hardware and infrastructure teams (experience)
Ability to thrive in fast-paced, high-stakes environments (experience)

Responsibilities

Design and implement job scheduling systems for GPU workloads
Build and operate Kubernetes cluster provisioning automation
Develop user-friendly quota management systems for researchers
Optimize snapshot delivery for fast model startup times
Create CI/CD pipelines for infrastructure deployments
Interface with researchers to understand AI training requirements
Collaborate with hardware teams on GPU fleet optimization
Maintain high utilization rates across massive GPU clusters
Build service frameworks supporting research workflows
Develop monitoring and alerting systems for fleet reliability
Automate cluster upgrades and maintenance procedures
Ensure low-maintenance platform operations at scale
Work cross-functionally with product and business teams
Troubleshoot and resolve production issues in real-time

Benefits

general: Competitive salary with equity package
general: Comprehensive medical, dental, and vision insurance
general: Hybrid work model (3 days in office per week)
general: Relocation assistance for new employees
general: Generous parental leave policy
general: Unlimited PTO with encouragement to disconnect
general: Mental health and wellness benefits
general: Professional development stipend
general: Gym membership and fitness reimbursements
general: Catered meals and snacks in office
general: Commuter benefits and parking
general: 401(k) matching program
general: Employee referral bonuses
general: Volunteer time off program

Target Your Resume for "Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Software Engineer, Fleet Infrastructure Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap