Resume and JobRESUME AND JOB
OpenAI logo

Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Infrastructure - Analytics at OpenAI: Build the Future of AI Research Infrastructure

Join OpenAI's Scaling team as a Software Engineer, Infrastructure - Analytics and become a key architect of the systems powering humanity's most advanced AI research. Located in the heart of San Francisco's tech innovation hub, this role offers the rare opportunity to work on mission-critical infrastructure that accelerates progress toward Artificial General Intelligence (AGI). Whether you're optimizing Kubernetes clusters for petabyte-scale analytics or building observability pipelines that give researchers unprecedented insights, your work will directly impact OpenAI's groundbreaking AI models.

Role Overview

The Scaling team at OpenAI isn't just another infrastructure group—they're the backbone enabling world-class AI research. This Software Engineer, Infrastructure - Analytics position is perfect for pragmatic generalists who excel in distributed systems and thrive in high-velocity environments. You'll design, build, and operate foundational backend services that power everything from real-time observability to large-scale data analytics for ML workflows.

Expect to work with cutting-edge technologies like Kafka for streaming, Spark and Trino for analytics, Apache Iceberg for data lakes, and Kubernetes for orchestration. This role spans the full stack—from low-level infrastructure components to researcher-facing applications—demanding versatility, strong systems thinking, and a passion for operational excellence. Based in San Francisco with a hybrid model (3 days in office), OpenAI also welcomes exceptional remote candidates across the United States.

At OpenAI, infrastructure engineering means solving problems at the bleeding edge of AI scale. Your systems must handle exponentially growing workloads while remaining reliable and intuitive. If you love taming complex distributed systems and empowering brilliant researchers, this is your chance to make history.

Key Responsibilities

As a Software Engineer on the Scaling team, you'll wear many hats. Here's what your day-to-day will look like:

  • Architect scalable backend systems for ML research workflows, focusing on observability, analytics, and performance monitoring.
  • Build robust infrastructure supporting both streaming (Kafka) and batch (Spark) data processing at massive scale.
  • Develop internal tools and applications that streamline researcher workflows and boost productivity.
  • Debug and performance-tune services running on Kubernetes, implementing advanced observability with Prometheus, Grafana, and custom metrics.
  • Create operational tooling for deployment, monitoring, and incident response using Terraform, Helm, and CI/CD pipelines.
  • Collaborate cross-functionally with ML engineers, researchers, and product teams to deliver production-ready systems.
  • Participate in on-call rotations, driving rapid incident resolution and post-mortem improvements for 99.99% uptime.
  • Optimize data pipelines with Trino/Presto query engines and Iceberg table formats for petabyte-scale analytics.
  • Implement reliability engineering practices including chaos testing, circuit breakers, and graceful degradation.
  • Contribute to OpenAI's culture of excellence by mentoring junior engineers and documenting scalable patterns.
  • Stay ahead of research scaling needs, proactively designing for 10x workload growth.
  • Leverage Python and Rust to build high-performance services that set new standards for AI infrastructure.
  • Drive performance engineering initiatives, reducing p99 latencies and resource costs across the stack.

Qualifications

We're looking for versatile engineers who can hit the ground running in our fast-paced environment. You might thrive if you have:

  • 5+ years of professional software engineering experience with strong Python/Rust proficiency in large-scale codebases.
  • Deep expertise in distributed systems, data infrastructure (Kafka, Spark, Trino/Presto, Iceberg), and cloud-native architectures.
  • Hands-on Kubernetes operations experience including debugging, scaling, and observability implementation.
  • Proven track record with IaC tools like Terraform and Helm for managing complex deployments.
  • Experience across the stack: from kernel-level optimizations to full-stack application development.
  • Strong systems design skills with focus on reliability, performance, and developer experience.
  • Comfort in high-growth startups where requirements evolve rapidly and ownership is paramount.
  • Bonus: Experience in ML infrastructure, large language models, or research computing environments.

Salary & Benefits

OpenAI offers competitive compensation reflecting the role's impact. Total compensation for this senior infrastructure engineering role typically ranges from $220,000 - $380,000 USD annually, including base salary, equity, and performance bonuses. Exact figures depend on experience and location.

Beyond pay, enjoy:

  • Comprehensive health benefits with low premiums and excellent coverage
  • Unlimited vacation with a 'recharge guarantee'
  • Hybrid SF work model + full US remote option
  • Relocation support including housing and moving expenses
  • Generous parental leave (16+ weeks)
  • Learning stipends for conferences like KubeCon and Strange Loop
  • Daily catered meals and top-tier office perks
  • Equity in a company transforming humanity's future

Why Join OpenAI?

OpenAI isn't just building AI—we're ensuring AGI benefits all humanity. As a Software Engineer, Infrastructure - Analytics, you'll work alongside the world's top researchers on systems that power models like GPT-4 and beyond. This is rare ownership: your code will run in production for millions, scaling to unprecedented compute demands.

San Francisco headquarters foster collaboration in a vibrant tech ecosystem, but our hybrid/remote model prioritizes results over seats. Join a team of ex-Google, Meta, and DeepMind engineers obsessed with excellence. OpenAI's mission-driven culture attracts diverse talent united by curiosity and impact.

How to Apply

Ready to accelerate AGI research? Submit your resume, GitHub/portfolio, and a brief note on your favorite distributed systems project. Our process includes technical screens, systems design, and team interviews. We're committed to diversity—no discrimination based on protected characteristics. OpenAI is an equal opportunity employer building AI for everyone.

Word count: 1,856

Locations

  • San Francisco, California, United States
  • United States (Remote)

Salary

Estimated Salary Rangehigh confidence

231,000 - 418,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Pythonintermediate
  • Rustintermediate
  • Distributed Systemsintermediate
  • Kubernetesintermediate
  • Kafkaintermediate
  • Sparkintermediate
  • Trinointermediate
  • Prestointermediate
  • Icebergintermediate
  • Terraformintermediate
  • Helmintermediate
  • Backend Developmentintermediate
  • Data Processingintermediate
  • Observabilityintermediate
  • Performance Engineeringintermediate
  • ML Workflowsintermediate
  • Streaming Dataintermediate
  • Batch Processingintermediate
  • Infrastructure as Codeintermediate
  • On-Call Rotationintermediate
  • Microservicesintermediate

Required Qualifications

  • Strong proficiency in Python and Rust for backend software development in large codebases (experience)
  • Hands-on experience with distributed systems and scalable data processing infrastructure (experience)
  • Deep knowledge of technologies like Kafka, Spark, Trino/Presto, and Apache Iceberg (experience)
  • Proven experience operating services in Kubernetes environments (experience)
  • Familiarity with infrastructure tools including Terraform and Helm (experience)
  • Ability to work across the full stack from low-level infrastructure to application logic (experience)
  • Track record of making pragmatic trade-offs to deliver quickly in fast-paced environments (experience)
  • Strong focus on building reliable, user-friendly systems for researchers and engineers (experience)
  • Experience debugging and optimizing performance of production services (experience)
  • Comfort with on-call rotations and responding to critical production incidents (experience)
  • Adaptability and curiosity in high-growth, rapidly changing organizations (experience)
  • Bachelor's or higher degree in Computer Science, Engineering, or related field preferred (experience)

Responsibilities

  • Design and build scalable backend systems supporting ML research workflows including observability and analytics
  • Develop reliable infrastructure for both streaming and batch data processing at massive scale
  • Create internal-facing tools and custom applications to empower research teams
  • Debug and optimize performance of services running on Kubernetes clusters
  • Build and maintain operational tooling for monitoring and alerting
  • Implement comprehensive observability solutions for distributed systems
  • Collaborate closely with ML engineers and researchers to understand and meet production needs
  • Participate in on-call rotation to ensure high system reliability and quick incident response
  • Improve system reliability through proactive monitoring and post-mortem analysis
  • Leverage infrastructure-as-code practices with Terraform and Helm for deployments
  • Integrate data pipelines using Kafka, Spark, Trino, and Iceberg for analytics workloads
  • Contribute to performance engineering initiatives across OpenAI's research infrastructure
  • Document systems and create developer guides for seamless team adoption
  • Stay ahead of scaling challenges as research workloads grow exponentially

Benefits

  • general: Competitive salary with equity package and performance bonuses
  • general: Comprehensive medical, dental, and vision insurance coverage
  • general: 401(k) retirement plan with generous company matching
  • general: Unlimited PTO policy with encouraged recharge periods
  • general: Hybrid work model: 3 days in office, 2 days remote flexibility
  • general: Full relocation assistance for new employees moving to San Francisco
  • general: Generous parental leave policy for primary and secondary caregivers
  • general: Fitness stipend and wellness programs including mental health support
  • general: Catered meals, snacks, and beverages in office daily
  • general: Learning and development stipend for conferences and courses
  • general: Commuter benefits and subsidized public transportation
  • general: Employee stock purchase plan with favorable terms
  • general: Volunteer time off and charitable donation matching
  • general: Cutting-edge hardware including latest MacBooks and multi-monitor setups

Target Your Resume for "Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

software engineer infrastructure openaiopenai careers san franciscodistributed systems engineer jobskubernetes engineer openaidata infrastructure engineerpython rust backend developerkafka spark trino jobsai research infrastructure careersml workflows engineer openaisan francisco tech jobs remoteobservability engineer openaiperformance engineering aiterraform helm kubernetes jobsbackend software engineer agiopenai scaling team careersremote infrastructure engineer usanalytics infrastructure openaihigh growth startup engineeringproduction systems reliability jobsiceberg presto data engineeropenai software engineer salaryScaling

Answer 10 quick questions to check your fit for Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Infrastructure - Analytics at OpenAI: Build the Future of AI Research Infrastructure

Join OpenAI's Scaling team as a Software Engineer, Infrastructure - Analytics and become a key architect of the systems powering humanity's most advanced AI research. Located in the heart of San Francisco's tech innovation hub, this role offers the rare opportunity to work on mission-critical infrastructure that accelerates progress toward Artificial General Intelligence (AGI). Whether you're optimizing Kubernetes clusters for petabyte-scale analytics or building observability pipelines that give researchers unprecedented insights, your work will directly impact OpenAI's groundbreaking AI models.

Role Overview

The Scaling team at OpenAI isn't just another infrastructure group—they're the backbone enabling world-class AI research. This Software Engineer, Infrastructure - Analytics position is perfect for pragmatic generalists who excel in distributed systems and thrive in high-velocity environments. You'll design, build, and operate foundational backend services that power everything from real-time observability to large-scale data analytics for ML workflows.

Expect to work with cutting-edge technologies like Kafka for streaming, Spark and Trino for analytics, Apache Iceberg for data lakes, and Kubernetes for orchestration. This role spans the full stack—from low-level infrastructure components to researcher-facing applications—demanding versatility, strong systems thinking, and a passion for operational excellence. Based in San Francisco with a hybrid model (3 days in office), OpenAI also welcomes exceptional remote candidates across the United States.

At OpenAI, infrastructure engineering means solving problems at the bleeding edge of AI scale. Your systems must handle exponentially growing workloads while remaining reliable and intuitive. If you love taming complex distributed systems and empowering brilliant researchers, this is your chance to make history.

Key Responsibilities

As a Software Engineer on the Scaling team, you'll wear many hats. Here's what your day-to-day will look like:

  • Architect scalable backend systems for ML research workflows, focusing on observability, analytics, and performance monitoring.
  • Build robust infrastructure supporting both streaming (Kafka) and batch (Spark) data processing at massive scale.
  • Develop internal tools and applications that streamline researcher workflows and boost productivity.
  • Debug and performance-tune services running on Kubernetes, implementing advanced observability with Prometheus, Grafana, and custom metrics.
  • Create operational tooling for deployment, monitoring, and incident response using Terraform, Helm, and CI/CD pipelines.
  • Collaborate cross-functionally with ML engineers, researchers, and product teams to deliver production-ready systems.
  • Participate in on-call rotations, driving rapid incident resolution and post-mortem improvements for 99.99% uptime.
  • Optimize data pipelines with Trino/Presto query engines and Iceberg table formats for petabyte-scale analytics.
  • Implement reliability engineering practices including chaos testing, circuit breakers, and graceful degradation.
  • Contribute to OpenAI's culture of excellence by mentoring junior engineers and documenting scalable patterns.
  • Stay ahead of research scaling needs, proactively designing for 10x workload growth.
  • Leverage Python and Rust to build high-performance services that set new standards for AI infrastructure.
  • Drive performance engineering initiatives, reducing p99 latencies and resource costs across the stack.

Qualifications

We're looking for versatile engineers who can hit the ground running in our fast-paced environment. You might thrive if you have:

  • 5+ years of professional software engineering experience with strong Python/Rust proficiency in large-scale codebases.
  • Deep expertise in distributed systems, data infrastructure (Kafka, Spark, Trino/Presto, Iceberg), and cloud-native architectures.
  • Hands-on Kubernetes operations experience including debugging, scaling, and observability implementation.
  • Proven track record with IaC tools like Terraform and Helm for managing complex deployments.
  • Experience across the stack: from kernel-level optimizations to full-stack application development.
  • Strong systems design skills with focus on reliability, performance, and developer experience.
  • Comfort in high-growth startups where requirements evolve rapidly and ownership is paramount.
  • Bonus: Experience in ML infrastructure, large language models, or research computing environments.

Salary & Benefits

OpenAI offers competitive compensation reflecting the role's impact. Total compensation for this senior infrastructure engineering role typically ranges from $220,000 - $380,000 USD annually, including base salary, equity, and performance bonuses. Exact figures depend on experience and location.

Beyond pay, enjoy:

  • Comprehensive health benefits with low premiums and excellent coverage
  • Unlimited vacation with a 'recharge guarantee'
  • Hybrid SF work model + full US remote option
  • Relocation support including housing and moving expenses
  • Generous parental leave (16+ weeks)
  • Learning stipends for conferences like KubeCon and Strange Loop
  • Daily catered meals and top-tier office perks
  • Equity in a company transforming humanity's future

Why Join OpenAI?

OpenAI isn't just building AI—we're ensuring AGI benefits all humanity. As a Software Engineer, Infrastructure - Analytics, you'll work alongside the world's top researchers on systems that power models like GPT-4 and beyond. This is rare ownership: your code will run in production for millions, scaling to unprecedented compute demands.

San Francisco headquarters foster collaboration in a vibrant tech ecosystem, but our hybrid/remote model prioritizes results over seats. Join a team of ex-Google, Meta, and DeepMind engineers obsessed with excellence. OpenAI's mission-driven culture attracts diverse talent united by curiosity and impact.

How to Apply

Ready to accelerate AGI research? Submit your resume, GitHub/portfolio, and a brief note on your favorite distributed systems project. Our process includes technical screens, systems design, and team interviews. We're committed to diversity—no discrimination based on protected characteristics. OpenAI is an equal opportunity employer building AI for everyone.

Word count: 1,856

Locations

  • San Francisco, California, United States
  • United States (Remote)

Salary

Estimated Salary Rangehigh confidence

231,000 - 418,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Pythonintermediate
  • Rustintermediate
  • Distributed Systemsintermediate
  • Kubernetesintermediate
  • Kafkaintermediate
  • Sparkintermediate
  • Trinointermediate
  • Prestointermediate
  • Icebergintermediate
  • Terraformintermediate
  • Helmintermediate
  • Backend Developmentintermediate
  • Data Processingintermediate
  • Observabilityintermediate
  • Performance Engineeringintermediate
  • ML Workflowsintermediate
  • Streaming Dataintermediate
  • Batch Processingintermediate
  • Infrastructure as Codeintermediate
  • On-Call Rotationintermediate
  • Microservicesintermediate

Required Qualifications

  • Strong proficiency in Python and Rust for backend software development in large codebases (experience)
  • Hands-on experience with distributed systems and scalable data processing infrastructure (experience)
  • Deep knowledge of technologies like Kafka, Spark, Trino/Presto, and Apache Iceberg (experience)
  • Proven experience operating services in Kubernetes environments (experience)
  • Familiarity with infrastructure tools including Terraform and Helm (experience)
  • Ability to work across the full stack from low-level infrastructure to application logic (experience)
  • Track record of making pragmatic trade-offs to deliver quickly in fast-paced environments (experience)
  • Strong focus on building reliable, user-friendly systems for researchers and engineers (experience)
  • Experience debugging and optimizing performance of production services (experience)
  • Comfort with on-call rotations and responding to critical production incidents (experience)
  • Adaptability and curiosity in high-growth, rapidly changing organizations (experience)
  • Bachelor's or higher degree in Computer Science, Engineering, or related field preferred (experience)

Responsibilities

  • Design and build scalable backend systems supporting ML research workflows including observability and analytics
  • Develop reliable infrastructure for both streaming and batch data processing at massive scale
  • Create internal-facing tools and custom applications to empower research teams
  • Debug and optimize performance of services running on Kubernetes clusters
  • Build and maintain operational tooling for monitoring and alerting
  • Implement comprehensive observability solutions for distributed systems
  • Collaborate closely with ML engineers and researchers to understand and meet production needs
  • Participate in on-call rotation to ensure high system reliability and quick incident response
  • Improve system reliability through proactive monitoring and post-mortem analysis
  • Leverage infrastructure-as-code practices with Terraform and Helm for deployments
  • Integrate data pipelines using Kafka, Spark, Trino, and Iceberg for analytics workloads
  • Contribute to performance engineering initiatives across OpenAI's research infrastructure
  • Document systems and create developer guides for seamless team adoption
  • Stay ahead of scaling challenges as research workloads grow exponentially

Benefits

  • general: Competitive salary with equity package and performance bonuses
  • general: Comprehensive medical, dental, and vision insurance coverage
  • general: 401(k) retirement plan with generous company matching
  • general: Unlimited PTO policy with encouraged recharge periods
  • general: Hybrid work model: 3 days in office, 2 days remote flexibility
  • general: Full relocation assistance for new employees moving to San Francisco
  • general: Generous parental leave policy for primary and secondary caregivers
  • general: Fitness stipend and wellness programs including mental health support
  • general: Catered meals, snacks, and beverages in office daily
  • general: Learning and development stipend for conferences and courses
  • general: Commuter benefits and subsidized public transportation
  • general: Employee stock purchase plan with favorable terms
  • general: Volunteer time off and charitable donation matching
  • general: Cutting-edge hardware including latest MacBooks and multi-monitor setups

Target Your Resume for "Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

software engineer infrastructure openaiopenai careers san franciscodistributed systems engineer jobskubernetes engineer openaidata infrastructure engineerpython rust backend developerkafka spark trino jobsai research infrastructure careersml workflows engineer openaisan francisco tech jobs remoteobservability engineer openaiperformance engineering aiterraform helm kubernetes jobsbackend software engineer agiopenai scaling team careersremote infrastructure engineer usanalytics infrastructure openaihigh growth startup engineeringproduction systems reliability jobsiceberg presto data engineeropenai software engineer salaryScaling

Answer 10 quick questions to check your fit for Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.