RESUME AND JOB

Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Agent Infrastructure at OpenAI - San Francisco, CA

Join OpenAI's Agent Infrastructure team and build the future of AI agents. This senior-level Software Engineer role offers a unique opportunity to work on cutting-edge systems that power the training and deployment of highly capable AI agents reaching hundreds of millions of users worldwide.

Role Overview

The Agent Infrastructure team at OpenAI is at the forefront of AI innovation, creating scalable systems that enable researchers to train agentic models in environments that mimic real-world software engineering workflows. Our custom-built platforms handle extreme compute scales, providing flexible workspaces where AI agents can execute code, debug issues, and develop software autonomously.

In parallel, we maintain OpenAI's core production platform that powers agentic products like Codex, Operator, tool use in ChatGPT, and upcoming agentic offerings. As a Software Engineer on this team, you'll bridge research and product engineering, scaling novel infrastructure from research prototypes to global production systems serving massive user bases.

This hybrid role (3 days/week in office) is based in San Francisco, CA or New York City, NY with full relocation support. You'll tackle the most challenging problems in AI infrastructure: building container orchestration that surpasses Kubernetes, optimizing globally-distributed compute clusters, and creating APIs that serve petabyte-scale training workloads.

OpenAI seeks engineers with proven experience scaling ML infrastructure who thrive on ambiguous, high-impact problems at the intersection of systems engineering, AI research, and product deployment.

Key Responsibilities

Your day-to-day will involve hands-on engineering across the full stack of agent infrastructure:

Architect and implement custom container orchestration platforms that scale beyond Kubernetes limits for massive AI training clusters
Design, develop, and maintain high-performance FastAPI and gRPC APIs serving as the core interface for agentic systems
Leverage Terraform to provision complex, multi-region infrastructure supporting both research experimentation and production workloads
Partner directly with AI researchers to build and optimize novel training environments for agentic model development
Push compute clusters to extreme performance limits, identifying and eliminating bottlenecks in distributed training pipelines
Scale prototype agent capabilities from research demos to production systems handling millions of concurrent users
Build flexible execution environments that emulate diverse real-world scenarios for agent training and evaluation
Maintain and evolve OpenAI's production agent platform powering products like ChatGPT tool use and future agentic offerings
Debug complex issues across virtualization layers, networking stacks, and application runtimes in high-scale environments
Collaborate cross-functionally with product engineering to integrate agent infrastructure into consumer-facing products
Optimize system performance across globally-distributed clusters serving hundreds of millions of daily interactions
Drive infrastructure evolution to support increasingly complex agentic capabilities and training paradigms
Contribute to platform reliability ensuring 99.99% uptime for mission-critical AI agent services

Qualifications

We're looking for senior engineers with deep expertise in ML infrastructure and systems at scale:

5+ years experience building large-scale machine learning training infrastructure
Demonstrated success scaling systems from prototype to million-scale production deployments
Expertise optimizing distributed systems performance and eliminating training bottlenecks
Strong proficiency with Terraform, cloud platforms, and infrastructure-as-code practices
Hands-on experience with FastAPI, gRPC, and high-throughput API development
Deep knowledge of container orchestration, virtualization, and compute cluster management
Proven track record collaborating with researchers on novel AI infrastructure projects
Exceptional debugging skills across complex, distributed systems stacks
Strong Python proficiency and systems programming experience
Passion for solving ambiguous problems at massive technical scale
Experience with production ML serving systems and agentic/AI deployment platforms

Salary & Benefits

Estimated Total Compensation: $350,000 - $550,000 USD annually (including base salary, equity, and bonuses). Exact compensation depends on experience and location.

Comprehensive Benefits Package:

Industry-leading health, dental, vision coverage
401(k) matching and retirement planning
Unlimited PTO with flexible vacation policy
Hybrid work model (3 days/week in SF/NYC offices)
Full relocation assistance including housing support
Daily catered meals, snacks, and premium office amenities
Professional development stipend and conference budget
Generous parental leave and family benefits
Wellness programs including mental health support
Gym memberships and commuter benefits

Why Join OpenAI?

OpenAI is building safe AGI that benefits all of humanity. Our Agent Infrastructure team solves problems no one else has attempted at scales that redefine what's possible in AI. You'll work with brilliant researchers and engineers on systems that power products used by hundreds of millions worldwide.

Unlike traditional tech companies, every engineer at OpenAI has direct impact on our most ambitious projects. Our hybrid model balances collaboration with flexibility, and our SF/NYC offices offer world-class facilities. Join us to shape the future of agentic AI and work on infrastructure that enables the world's most capable models.

How to Apply

Ready to build the infrastructure powering tomorrow's AI agents? Submit your resume and a brief note about your most impactful infrastructure project. We're moving quickly but hiring thoughtfully - top candidates can expect responses within 48 hours.

Locations: San Francisco, CA | New York City, NY
Hybrid: 3 days/week in office
Visa sponsorship: Available for exceptional candidates

Locations

San Francisco, California, United States
New York City, New York, United States

Salary

Estimated Salary Rangehigh confidence

367,500 - 605,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Machine Learning Infrastructureintermediate
Container Orchestrationintermediate
Kubernetesintermediate
Terraformintermediate
FastAPIintermediate
gRPCintermediate
Cloud Computingintermediate
Distributed Systemsintermediate
Performance Optimizationintermediate
Infrastructure as Codeintermediate
AI Training Systemsintermediate
Scalable APIsintermediate
Compute Cluster Managementintermediate
Virtualizationintermediate
High-Performance Computingintermediate
Pythonintermediate
DevOpsintermediate
Microservicesintermediate
AWS/GCP/Azureintermediate
System Debuggingintermediate

Required Qualifications

Deep experience building large-scale machine learning infrastructure for AI training at massive scale (experience)
Proven track record of scaling systems from 0-1 to 1,000,000x deployment (experience)
Expertise in identifying and resolving bottlenecks in distributed training environments (experience)
Strong proficiency with infrastructure-as-code tools like Terraform for complex deployments (experience)
Hands-on experience developing and maintaining FastAPI and gRPC APIs for high-throughput services (experience)
Familiarity with container orchestration platforms beyond standard Kubernetes (experience)
Ability to optimize performance in globally-distributed, high-scale compute clusters (experience)
Experience collaborating closely with AI researchers on novel training infrastructure (experience)
Deep knowledge of cloud platforms (AWS, GCP, Azure) and virtualization technologies (experience)
Strong problem-solving skills for ambiguous challenges at infrastructure-AI intersection (experience)
Proficiency in Python and systems programming for performance-critical applications (experience)
Track record of pushing compute clusters to extreme limits in production environments (experience)
Experience with production deployment platforms for AI agentic products (experience)

Responsibilities

Develop and scale novel container orchestration platform exceeding Kubernetes capabilities
Build and maintain FastAPI and gRPC APIs serving agentic infrastructure in training and production
Use Terraform to provision and evolve complex infrastructure for research and production environments
Collaborate with research teams to design and optimize systems for experimental AI training runs
Push massive compute clusters to their performance limits for agentic model training
Identify and eliminate bottlenecks in large-scale distributed training systems
Integrate infrastructure with OpenAI products like Codex, Operator, and ChatGPT tool use
Scale new agentic capabilities from prototype to global production deployment
Debug and optimize agent execution environments mimicking human SWE workflows
Build flexible training environments emulating diverse real-world agent scenarios
Maintain core production platform powering hundreds of millions of agent interactions
Engineer high-performance systems for novel AI agent use cases at extreme scale
Work cross-functionally with product teams to launch future agentic products

Benefits

general: Competitive salary with equity package in leading AI company
general: Comprehensive health, dental, and vision insurance coverage
general: 401(k) matching and retirement savings plans
general: Generous paid time off and flexible vacation policy
general: Hybrid work model with 3 days in office per week
general: Full relocation assistance for new employees to SF or NYC
general: Catered meals, snacks, and beverages daily in office
general: State-of-the-art office facilities in San Francisco and NYC
general: Learning stipend for professional development and conferences
general: Mental health support and wellness programs
general: Parental leave and family planning benefits
general: Gym membership reimbursement and fitness programs
general: Commuter benefits and transportation assistance
general: Volunteer time off and charitable giving matching
general: Cutting-edge AI projects impacting billions worldwide

Target Your Resume for "Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

software engineer openaiagent infrastructure engineerai infrastructure jobs san franciscoml infrastructure engineeropenai careers software engineercontainer orchestration kubernetesterraform ai infrastructurefastapi grpc engineerdistributed systems ml scaleai training infrastructure jobsopenai agent infrastructuresenior ml engineer openaicompute cluster optimizationproduction ai platform engineersan francisco ai jobsnew york ai infrastructureopenai software engineer salaryagentic ai infrastructurelarge scale ml systemsopenai hybrid work jobsai researcher infrastructure collaborationhigh performance computing aiScaling

Answer 10 quick questions to check your fit for Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Agent Infrastructure at OpenAI - San Francisco, CA

Role Overview

OpenAI seeks engineers with proven experience scaling ML infrastructure who thrive on ambiguous, high-impact problems at the intersection of systems engineering, AI research, and product deployment.

Key Responsibilities

Your day-to-day will involve hands-on engineering across the full stack of agent infrastructure:

Architect and implement custom container orchestration platforms that scale beyond Kubernetes limits for massive AI training clusters
Design, develop, and maintain high-performance FastAPI and gRPC APIs serving as the core interface for agentic systems
Leverage Terraform to provision complex, multi-region infrastructure supporting both research experimentation and production workloads
Partner directly with AI researchers to build and optimize novel training environments for agentic model development
Push compute clusters to extreme performance limits, identifying and eliminating bottlenecks in distributed training pipelines
Scale prototype agent capabilities from research demos to production systems handling millions of concurrent users
Build flexible execution environments that emulate diverse real-world scenarios for agent training and evaluation
Maintain and evolve OpenAI's production agent platform powering products like ChatGPT tool use and future agentic offerings
Debug complex issues across virtualization layers, networking stacks, and application runtimes in high-scale environments
Collaborate cross-functionally with product engineering to integrate agent infrastructure into consumer-facing products
Optimize system performance across globally-distributed clusters serving hundreds of millions of daily interactions
Drive infrastructure evolution to support increasingly complex agentic capabilities and training paradigms
Contribute to platform reliability ensuring 99.99% uptime for mission-critical AI agent services

Qualifications

We're looking for senior engineers with deep expertise in ML infrastructure and systems at scale:

5+ years experience building large-scale machine learning training infrastructure
Demonstrated success scaling systems from prototype to million-scale production deployments
Expertise optimizing distributed systems performance and eliminating training bottlenecks
Strong proficiency with Terraform, cloud platforms, and infrastructure-as-code practices
Hands-on experience with FastAPI, gRPC, and high-throughput API development
Deep knowledge of container orchestration, virtualization, and compute cluster management
Proven track record collaborating with researchers on novel AI infrastructure projects
Exceptional debugging skills across complex, distributed systems stacks
Strong Python proficiency and systems programming experience
Passion for solving ambiguous problems at massive technical scale
Experience with production ML serving systems and agentic/AI deployment platforms

Salary & Benefits

Estimated Total Compensation: $350,000 - $550,000 USD annually (including base salary, equity, and bonuses). Exact compensation depends on experience and location.

Comprehensive Benefits Package:

Industry-leading health, dental, vision coverage
401(k) matching and retirement planning
Unlimited PTO with flexible vacation policy
Hybrid work model (3 days/week in SF/NYC offices)
Full relocation assistance including housing support
Daily catered meals, snacks, and premium office amenities
Professional development stipend and conference budget
Generous parental leave and family benefits
Wellness programs including mental health support
Gym memberships and commuter benefits

Why Join OpenAI?

How to Apply

Locations: San Francisco, CA | New York City, NY
Hybrid: 3 days/week in office
Visa sponsorship: Available for exceptional candidates

Locations

San Francisco, California, United States
New York City, New York, United States

Salary

Estimated Salary Rangehigh confidence

367,500 - 605,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Machine Learning Infrastructureintermediate
Container Orchestrationintermediate
Kubernetesintermediate
Terraformintermediate
FastAPIintermediate
gRPCintermediate
Cloud Computingintermediate
Distributed Systemsintermediate
Performance Optimizationintermediate
Infrastructure as Codeintermediate
AI Training Systemsintermediate
Scalable APIsintermediate
Compute Cluster Managementintermediate
Virtualizationintermediate
High-Performance Computingintermediate
Pythonintermediate
DevOpsintermediate
Microservicesintermediate
AWS/GCP/Azureintermediate
System Debuggingintermediate

Required Qualifications

Deep experience building large-scale machine learning infrastructure for AI training at massive scale (experience)
Proven track record of scaling systems from 0-1 to 1,000,000x deployment (experience)
Expertise in identifying and resolving bottlenecks in distributed training environments (experience)
Strong proficiency with infrastructure-as-code tools like Terraform for complex deployments (experience)
Hands-on experience developing and maintaining FastAPI and gRPC APIs for high-throughput services (experience)
Familiarity with container orchestration platforms beyond standard Kubernetes (experience)
Ability to optimize performance in globally-distributed, high-scale compute clusters (experience)
Experience collaborating closely with AI researchers on novel training infrastructure (experience)
Deep knowledge of cloud platforms (AWS, GCP, Azure) and virtualization technologies (experience)
Strong problem-solving skills for ambiguous challenges at infrastructure-AI intersection (experience)
Proficiency in Python and systems programming for performance-critical applications (experience)
Track record of pushing compute clusters to extreme limits in production environments (experience)
Experience with production deployment platforms for AI agentic products (experience)

Responsibilities

Develop and scale novel container orchestration platform exceeding Kubernetes capabilities
Build and maintain FastAPI and gRPC APIs serving agentic infrastructure in training and production
Use Terraform to provision and evolve complex infrastructure for research and production environments
Collaborate with research teams to design and optimize systems for experimental AI training runs
Push massive compute clusters to their performance limits for agentic model training
Identify and eliminate bottlenecks in large-scale distributed training systems
Integrate infrastructure with OpenAI products like Codex, Operator, and ChatGPT tool use
Scale new agentic capabilities from prototype to global production deployment
Debug and optimize agent execution environments mimicking human SWE workflows
Build flexible training environments emulating diverse real-world agent scenarios
Maintain core production platform powering hundreds of millions of agent interactions
Engineer high-performance systems for novel AI agent use cases at extreme scale
Work cross-functionally with product teams to launch future agentic products

Benefits

general: Competitive salary with equity package in leading AI company
general: Comprehensive health, dental, and vision insurance coverage
general: 401(k) matching and retirement savings plans
general: Generous paid time off and flexible vacation policy
general: Hybrid work model with 3 days in office per week
general: Full relocation assistance for new employees to SF or NYC
general: Catered meals, snacks, and beverages daily in office
general: State-of-the-art office facilities in San Francisco and NYC
general: Learning stipend for professional development and conferences
general: Mental health support and wellness programs
general: Parental leave and family planning benefits
general: Gym membership reimbursement and fitness programs
general: Commuter benefits and transportation assistance
general: Volunteer time off and charitable giving matching
general: Cutting-edge AI projects impacting billions worldwide

Target Your Resume for "Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Software Engineer, Agent Infrastructure Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap