Resume and JobRESUME AND JOB
OpenAI logo

Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Agent Infrastructure at OpenAI - San Francisco, CA

Join OpenAI's Agent Infrastructure team and build the future of AI agents. This senior-level Software Engineer role offers a unique opportunity to work on cutting-edge systems that power the training and deployment of highly capable AI agents reaching hundreds of millions of users worldwide.

Role Overview

The Agent Infrastructure team at OpenAI is at the forefront of AI innovation, creating scalable systems that enable researchers to train agentic models in environments that mimic real-world software engineering workflows. Our custom-built platforms handle extreme compute scales, providing flexible workspaces where AI agents can execute code, debug issues, and develop software autonomously.

In parallel, we maintain OpenAI's core production platform that powers agentic products like Codex, Operator, tool use in ChatGPT, and upcoming agentic offerings. As a Software Engineer on this team, you'll bridge research and product engineering, scaling novel infrastructure from research prototypes to global production systems serving massive user bases.

This hybrid role (3 days/week in office) is based in San Francisco, CA or New York City, NY with full relocation support. You'll tackle the most challenging problems in AI infrastructure: building container orchestration that surpasses Kubernetes, optimizing globally-distributed compute clusters, and creating APIs that serve petabyte-scale training workloads.

OpenAI seeks engineers with proven experience scaling ML infrastructure who thrive on ambiguous, high-impact problems at the intersection of systems engineering, AI research, and product deployment.

Key Responsibilities

Your day-to-day will involve hands-on engineering across the full stack of agent infrastructure:

  • Architect and implement custom container orchestration platforms that scale beyond Kubernetes limits for massive AI training clusters
  • Design, develop, and maintain high-performance FastAPI and gRPC APIs serving as the core interface for agentic systems
  • Leverage Terraform to provision complex, multi-region infrastructure supporting both research experimentation and production workloads
  • Partner directly with AI researchers to build and optimize novel training environments for agentic model development
  • Push compute clusters to extreme performance limits, identifying and eliminating bottlenecks in distributed training pipelines
  • Scale prototype agent capabilities from research demos to production systems handling millions of concurrent users
  • Build flexible execution environments that emulate diverse real-world scenarios for agent training and evaluation
  • Maintain and evolve OpenAI's production agent platform powering products like ChatGPT tool use and future agentic offerings
  • Debug complex issues across virtualization layers, networking stacks, and application runtimes in high-scale environments
  • Collaborate cross-functionally with product engineering to integrate agent infrastructure into consumer-facing products
  • Optimize system performance across globally-distributed clusters serving hundreds of millions of daily interactions
  • Drive infrastructure evolution to support increasingly complex agentic capabilities and training paradigms
  • Contribute to platform reliability ensuring 99.99% uptime for mission-critical AI agent services

Qualifications

We're looking for senior engineers with deep expertise in ML infrastructure and systems at scale:

  • 5+ years experience building large-scale machine learning training infrastructure
  • Demonstrated success scaling systems from prototype to million-scale production deployments
  • Expertise optimizing distributed systems performance and eliminating training bottlenecks
  • Strong proficiency with Terraform, cloud platforms, and infrastructure-as-code practices
  • Hands-on experience with FastAPI, gRPC, and high-throughput API development
  • Deep knowledge of container orchestration, virtualization, and compute cluster management
  • Proven track record collaborating with researchers on novel AI infrastructure projects
  • Exceptional debugging skills across complex, distributed systems stacks
  • Strong Python proficiency and systems programming experience
  • Passion for solving ambiguous problems at massive technical scale
  • Experience with production ML serving systems and agentic/AI deployment platforms

Salary & Benefits

Estimated Total Compensation: $350,000 - $550,000 USD annually (including base salary, equity, and bonuses). Exact compensation depends on experience and location.

Comprehensive Benefits Package:

  • Industry-leading health, dental, vision coverage
  • 401(k) matching and retirement planning
  • Unlimited PTO with flexible vacation policy
  • Hybrid work model (3 days/week in SF/NYC offices)
  • Full relocation assistance including housing support
  • Daily catered meals, snacks, and premium office amenities
  • Professional development stipend and conference budget
  • Generous parental leave and family benefits
  • Wellness programs including mental health support
  • Gym memberships and commuter benefits

Why Join OpenAI?

OpenAI is building safe AGI that benefits all of humanity. Our Agent Infrastructure team solves problems no one else has attempted at scales that redefine what's possible in AI. You'll work with brilliant researchers and engineers on systems that power products used by hundreds of millions worldwide.

Unlike traditional tech companies, every engineer at OpenAI has direct impact on our most ambitious projects. Our hybrid model balances collaboration with flexibility, and our SF/NYC offices offer world-class facilities. Join us to shape the future of agentic AI and work on infrastructure that enables the world's most capable models.

How to Apply

Ready to build the infrastructure powering tomorrow's AI agents? Submit your resume and a brief note about your most impactful infrastructure project. We're moving quickly but hiring thoughtfully - top candidates can expect responses within 48 hours.

Locations: San Francisco, CA | New York City, NY
Hybrid: 3 days/week in office
Visa sponsorship: Available for exceptional candidates

Locations

  • San Francisco, California, United States
  • New York City, New York, United States

Salary

Estimated Salary Rangehigh confidence

367,500 - 605,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Machine Learning Infrastructureintermediate
  • Container Orchestrationintermediate
  • Kubernetesintermediate
  • Terraformintermediate
  • FastAPIintermediate
  • gRPCintermediate
  • Cloud Computingintermediate
  • Distributed Systemsintermediate
  • Performance Optimizationintermediate
  • Infrastructure as Codeintermediate
  • AI Training Systemsintermediate
  • Scalable APIsintermediate
  • Compute Cluster Managementintermediate
  • Virtualizationintermediate
  • High-Performance Computingintermediate
  • Pythonintermediate
  • DevOpsintermediate
  • Microservicesintermediate
  • AWS/GCP/Azureintermediate
  • System Debuggingintermediate

Required Qualifications

  • Deep experience building large-scale machine learning infrastructure for AI training at massive scale (experience)
  • Proven track record of scaling systems from 0-1 to 1,000,000x deployment (experience)
  • Expertise in identifying and resolving bottlenecks in distributed training environments (experience)
  • Strong proficiency with infrastructure-as-code tools like Terraform for complex deployments (experience)
  • Hands-on experience developing and maintaining FastAPI and gRPC APIs for high-throughput services (experience)
  • Familiarity with container orchestration platforms beyond standard Kubernetes (experience)
  • Ability to optimize performance in globally-distributed, high-scale compute clusters (experience)
  • Experience collaborating closely with AI researchers on novel training infrastructure (experience)
  • Deep knowledge of cloud platforms (AWS, GCP, Azure) and virtualization technologies (experience)
  • Strong problem-solving skills for ambiguous challenges at infrastructure-AI intersection (experience)
  • Proficiency in Python and systems programming for performance-critical applications (experience)
  • Track record of pushing compute clusters to extreme limits in production environments (experience)
  • Experience with production deployment platforms for AI agentic products (experience)

Responsibilities

  • Develop and scale novel container orchestration platform exceeding Kubernetes capabilities
  • Build and maintain FastAPI and gRPC APIs serving agentic infrastructure in training and production
  • Use Terraform to provision and evolve complex infrastructure for research and production environments
  • Collaborate with research teams to design and optimize systems for experimental AI training runs
  • Push massive compute clusters to their performance limits for agentic model training
  • Identify and eliminate bottlenecks in large-scale distributed training systems
  • Integrate infrastructure with OpenAI products like Codex, Operator, and ChatGPT tool use
  • Scale new agentic capabilities from prototype to global production deployment
  • Debug and optimize agent execution environments mimicking human SWE workflows
  • Build flexible training environments emulating diverse real-world agent scenarios
  • Maintain core production platform powering hundreds of millions of agent interactions
  • Engineer high-performance systems for novel AI agent use cases at extreme scale
  • Work cross-functionally with product teams to launch future agentic products

Benefits

  • general: Competitive salary with equity package in leading AI company
  • general: Comprehensive health, dental, and vision insurance coverage
  • general: 401(k) matching and retirement savings plans
  • general: Generous paid time off and flexible vacation policy
  • general: Hybrid work model with 3 days in office per week
  • general: Full relocation assistance for new employees to SF or NYC
  • general: Catered meals, snacks, and beverages daily in office
  • general: State-of-the-art office facilities in San Francisco and NYC
  • general: Learning stipend for professional development and conferences
  • general: Mental health support and wellness programs
  • general: Parental leave and family planning benefits
  • general: Gym membership reimbursement and fitness programs
  • general: Commuter benefits and transportation assistance
  • general: Volunteer time off and charitable giving matching
  • general: Cutting-edge AI projects impacting billions worldwide

Target Your Resume for "Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

software engineer openaiagent infrastructure engineerai infrastructure jobs san franciscoml infrastructure engineeropenai careers software engineercontainer orchestration kubernetesterraform ai infrastructurefastapi grpc engineerdistributed systems ml scaleai training infrastructure jobsopenai agent infrastructuresenior ml engineer openaicompute cluster optimizationproduction ai platform engineersan francisco ai jobsnew york ai infrastructureopenai software engineer salaryagentic ai infrastructurelarge scale ml systemsopenai hybrid work jobsai researcher infrastructure collaborationhigh performance computing aiScaling

Answer 10 quick questions to check your fit for Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Agent Infrastructure at OpenAI - San Francisco, CA

Join OpenAI's Agent Infrastructure team and build the future of AI agents. This senior-level Software Engineer role offers a unique opportunity to work on cutting-edge systems that power the training and deployment of highly capable AI agents reaching hundreds of millions of users worldwide.

Role Overview

The Agent Infrastructure team at OpenAI is at the forefront of AI innovation, creating scalable systems that enable researchers to train agentic models in environments that mimic real-world software engineering workflows. Our custom-built platforms handle extreme compute scales, providing flexible workspaces where AI agents can execute code, debug issues, and develop software autonomously.

In parallel, we maintain OpenAI's core production platform that powers agentic products like Codex, Operator, tool use in ChatGPT, and upcoming agentic offerings. As a Software Engineer on this team, you'll bridge research and product engineering, scaling novel infrastructure from research prototypes to global production systems serving massive user bases.

This hybrid role (3 days/week in office) is based in San Francisco, CA or New York City, NY with full relocation support. You'll tackle the most challenging problems in AI infrastructure: building container orchestration that surpasses Kubernetes, optimizing globally-distributed compute clusters, and creating APIs that serve petabyte-scale training workloads.

OpenAI seeks engineers with proven experience scaling ML infrastructure who thrive on ambiguous, high-impact problems at the intersection of systems engineering, AI research, and product deployment.

Key Responsibilities

Your day-to-day will involve hands-on engineering across the full stack of agent infrastructure:

  • Architect and implement custom container orchestration platforms that scale beyond Kubernetes limits for massive AI training clusters
  • Design, develop, and maintain high-performance FastAPI and gRPC APIs serving as the core interface for agentic systems
  • Leverage Terraform to provision complex, multi-region infrastructure supporting both research experimentation and production workloads
  • Partner directly with AI researchers to build and optimize novel training environments for agentic model development
  • Push compute clusters to extreme performance limits, identifying and eliminating bottlenecks in distributed training pipelines
  • Scale prototype agent capabilities from research demos to production systems handling millions of concurrent users
  • Build flexible execution environments that emulate diverse real-world scenarios for agent training and evaluation
  • Maintain and evolve OpenAI's production agent platform powering products like ChatGPT tool use and future agentic offerings
  • Debug complex issues across virtualization layers, networking stacks, and application runtimes in high-scale environments
  • Collaborate cross-functionally with product engineering to integrate agent infrastructure into consumer-facing products
  • Optimize system performance across globally-distributed clusters serving hundreds of millions of daily interactions
  • Drive infrastructure evolution to support increasingly complex agentic capabilities and training paradigms
  • Contribute to platform reliability ensuring 99.99% uptime for mission-critical AI agent services

Qualifications

We're looking for senior engineers with deep expertise in ML infrastructure and systems at scale:

  • 5+ years experience building large-scale machine learning training infrastructure
  • Demonstrated success scaling systems from prototype to million-scale production deployments
  • Expertise optimizing distributed systems performance and eliminating training bottlenecks
  • Strong proficiency with Terraform, cloud platforms, and infrastructure-as-code practices
  • Hands-on experience with FastAPI, gRPC, and high-throughput API development
  • Deep knowledge of container orchestration, virtualization, and compute cluster management
  • Proven track record collaborating with researchers on novel AI infrastructure projects
  • Exceptional debugging skills across complex, distributed systems stacks
  • Strong Python proficiency and systems programming experience
  • Passion for solving ambiguous problems at massive technical scale
  • Experience with production ML serving systems and agentic/AI deployment platforms

Salary & Benefits

Estimated Total Compensation: $350,000 - $550,000 USD annually (including base salary, equity, and bonuses). Exact compensation depends on experience and location.

Comprehensive Benefits Package:

  • Industry-leading health, dental, vision coverage
  • 401(k) matching and retirement planning
  • Unlimited PTO with flexible vacation policy
  • Hybrid work model (3 days/week in SF/NYC offices)
  • Full relocation assistance including housing support
  • Daily catered meals, snacks, and premium office amenities
  • Professional development stipend and conference budget
  • Generous parental leave and family benefits
  • Wellness programs including mental health support
  • Gym memberships and commuter benefits

Why Join OpenAI?

OpenAI is building safe AGI that benefits all of humanity. Our Agent Infrastructure team solves problems no one else has attempted at scales that redefine what's possible in AI. You'll work with brilliant researchers and engineers on systems that power products used by hundreds of millions worldwide.

Unlike traditional tech companies, every engineer at OpenAI has direct impact on our most ambitious projects. Our hybrid model balances collaboration with flexibility, and our SF/NYC offices offer world-class facilities. Join us to shape the future of agentic AI and work on infrastructure that enables the world's most capable models.

How to Apply

Ready to build the infrastructure powering tomorrow's AI agents? Submit your resume and a brief note about your most impactful infrastructure project. We're moving quickly but hiring thoughtfully - top candidates can expect responses within 48 hours.

Locations: San Francisco, CA | New York City, NY
Hybrid: 3 days/week in office
Visa sponsorship: Available for exceptional candidates

Locations

  • San Francisco, California, United States
  • New York City, New York, United States

Salary

Estimated Salary Rangehigh confidence

367,500 - 605,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Machine Learning Infrastructureintermediate
  • Container Orchestrationintermediate
  • Kubernetesintermediate
  • Terraformintermediate
  • FastAPIintermediate
  • gRPCintermediate
  • Cloud Computingintermediate
  • Distributed Systemsintermediate
  • Performance Optimizationintermediate
  • Infrastructure as Codeintermediate
  • AI Training Systemsintermediate
  • Scalable APIsintermediate
  • Compute Cluster Managementintermediate
  • Virtualizationintermediate
  • High-Performance Computingintermediate
  • Pythonintermediate
  • DevOpsintermediate
  • Microservicesintermediate
  • AWS/GCP/Azureintermediate
  • System Debuggingintermediate

Required Qualifications

  • Deep experience building large-scale machine learning infrastructure for AI training at massive scale (experience)
  • Proven track record of scaling systems from 0-1 to 1,000,000x deployment (experience)
  • Expertise in identifying and resolving bottlenecks in distributed training environments (experience)
  • Strong proficiency with infrastructure-as-code tools like Terraform for complex deployments (experience)
  • Hands-on experience developing and maintaining FastAPI and gRPC APIs for high-throughput services (experience)
  • Familiarity with container orchestration platforms beyond standard Kubernetes (experience)
  • Ability to optimize performance in globally-distributed, high-scale compute clusters (experience)
  • Experience collaborating closely with AI researchers on novel training infrastructure (experience)
  • Deep knowledge of cloud platforms (AWS, GCP, Azure) and virtualization technologies (experience)
  • Strong problem-solving skills for ambiguous challenges at infrastructure-AI intersection (experience)
  • Proficiency in Python and systems programming for performance-critical applications (experience)
  • Track record of pushing compute clusters to extreme limits in production environments (experience)
  • Experience with production deployment platforms for AI agentic products (experience)

Responsibilities

  • Develop and scale novel container orchestration platform exceeding Kubernetes capabilities
  • Build and maintain FastAPI and gRPC APIs serving agentic infrastructure in training and production
  • Use Terraform to provision and evolve complex infrastructure for research and production environments
  • Collaborate with research teams to design and optimize systems for experimental AI training runs
  • Push massive compute clusters to their performance limits for agentic model training
  • Identify and eliminate bottlenecks in large-scale distributed training systems
  • Integrate infrastructure with OpenAI products like Codex, Operator, and ChatGPT tool use
  • Scale new agentic capabilities from prototype to global production deployment
  • Debug and optimize agent execution environments mimicking human SWE workflows
  • Build flexible training environments emulating diverse real-world agent scenarios
  • Maintain core production platform powering hundreds of millions of agent interactions
  • Engineer high-performance systems for novel AI agent use cases at extreme scale
  • Work cross-functionally with product teams to launch future agentic products

Benefits

  • general: Competitive salary with equity package in leading AI company
  • general: Comprehensive health, dental, and vision insurance coverage
  • general: 401(k) matching and retirement savings plans
  • general: Generous paid time off and flexible vacation policy
  • general: Hybrid work model with 3 days in office per week
  • general: Full relocation assistance for new employees to SF or NYC
  • general: Catered meals, snacks, and beverages daily in office
  • general: State-of-the-art office facilities in San Francisco and NYC
  • general: Learning stipend for professional development and conferences
  • general: Mental health support and wellness programs
  • general: Parental leave and family planning benefits
  • general: Gym membership reimbursement and fitness programs
  • general: Commuter benefits and transportation assistance
  • general: Volunteer time off and charitable giving matching
  • general: Cutting-edge AI projects impacting billions worldwide

Target Your Resume for "Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

software engineer openaiagent infrastructure engineerai infrastructure jobs san franciscoml infrastructure engineeropenai careers software engineercontainer orchestration kubernetesterraform ai infrastructurefastapi grpc engineerdistributed systems ml scaleai training infrastructure jobsopenai agent infrastructuresenior ml engineer openaicompute cluster optimizationproduction ai platform engineersan francisco ai jobsnew york ai infrastructureopenai software engineer salaryagentic ai infrastructurelarge scale ml systemsopenai hybrid work jobsai researcher infrastructure collaborationhigh performance computing aiScaling

Answer 10 quick questions to check your fit for Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.