RESUME AND JOB

Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Infrastructure - Analytics at OpenAI: Build the Future of AI Research Infrastructure

Join OpenAI's Scaling team as a Software Engineer, Infrastructure - Analytics and become a key architect of the systems powering humanity's most advanced AI research. Located in the heart of San Francisco's tech innovation hub, this role offers the rare opportunity to work on mission-critical infrastructure that accelerates progress toward Artificial General Intelligence (AGI). Whether you're optimizing Kubernetes clusters for petabyte-scale analytics or building observability pipelines that give researchers unprecedented insights, your work will directly impact OpenAI's groundbreaking AI models.

Role Overview

The Scaling team at OpenAI isn't just another infrastructure group—they're the backbone enabling world-class AI research. This Software Engineer, Infrastructure - Analytics position is perfect for pragmatic generalists who excel in distributed systems and thrive in high-velocity environments. You'll design, build, and operate foundational backend services that power everything from real-time observability to large-scale data analytics for ML workflows.

Expect to work with cutting-edge technologies like Kafka for streaming, Spark and Trino for analytics, Apache Iceberg for data lakes, and Kubernetes for orchestration. This role spans the full stack—from low-level infrastructure components to researcher-facing applications—demanding versatility, strong systems thinking, and a passion for operational excellence. Based in San Francisco with a hybrid model (3 days in office), OpenAI also welcomes exceptional remote candidates across the United States.

At OpenAI, infrastructure engineering means solving problems at the bleeding edge of AI scale. Your systems must handle exponentially growing workloads while remaining reliable and intuitive. If you love taming complex distributed systems and empowering brilliant researchers, this is your chance to make history.

Key Responsibilities

As a Software Engineer on the Scaling team, you'll wear many hats. Here's what your day-to-day will look like:

Architect scalable backend systems for ML research workflows, focusing on observability, analytics, and performance monitoring.
Build robust infrastructure supporting both streaming (Kafka) and batch (Spark) data processing at massive scale.
Develop internal tools and applications that streamline researcher workflows and boost productivity.
Debug and performance-tune services running on Kubernetes, implementing advanced observability with Prometheus, Grafana, and custom metrics.
Create operational tooling for deployment, monitoring, and incident response using Terraform, Helm, and CI/CD pipelines.
Collaborate cross-functionally with ML engineers, researchers, and product teams to deliver production-ready systems.
Participate in on-call rotations, driving rapid incident resolution and post-mortem improvements for 99.99% uptime.
Optimize data pipelines with Trino/Presto query engines and Iceberg table formats for petabyte-scale analytics.
Implement reliability engineering practices including chaos testing, circuit breakers, and graceful degradation.
Contribute to OpenAI's culture of excellence by mentoring junior engineers and documenting scalable patterns.
Stay ahead of research scaling needs, proactively designing for 10x workload growth.
Leverage Python and Rust to build high-performance services that set new standards for AI infrastructure.
Drive performance engineering initiatives, reducing p99 latencies and resource costs across the stack.

Qualifications

We're looking for versatile engineers who can hit the ground running in our fast-paced environment. You might thrive if you have:

5+ years of professional software engineering experience with strong Python/Rust proficiency in large-scale codebases.
Deep expertise in distributed systems, data infrastructure (Kafka, Spark, Trino/Presto, Iceberg), and cloud-native architectures.
Hands-on Kubernetes operations experience including debugging, scaling, and observability implementation.
Proven track record with IaC tools like Terraform and Helm for managing complex deployments.
Experience across the stack: from kernel-level optimizations to full-stack application development.
Strong systems design skills with focus on reliability, performance, and developer experience.
Comfort in high-growth startups where requirements evolve rapidly and ownership is paramount.
Bonus: Experience in ML infrastructure, large language models, or research computing environments.

Salary & Benefits

OpenAI offers competitive compensation reflecting the role's impact. Total compensation for this senior infrastructure engineering role typically ranges from $220,000 - $380,000 USD annually, including base salary, equity, and performance bonuses. Exact figures depend on experience and location.

Beyond pay, enjoy:

Comprehensive health benefits with low premiums and excellent coverage
Unlimited vacation with a 'recharge guarantee'
Hybrid SF work model + full US remote option
Relocation support including housing and moving expenses
Generous parental leave (16+ weeks)
Learning stipends for conferences like KubeCon and Strange Loop
Daily catered meals and top-tier office perks
Equity in a company transforming humanity's future

Why Join OpenAI?

OpenAI isn't just building AI—we're ensuring AGI benefits all humanity. As a Software Engineer, Infrastructure - Analytics, you'll work alongside the world's top researchers on systems that power models like GPT-4 and beyond. This is rare ownership: your code will run in production for millions, scaling to unprecedented compute demands.

San Francisco headquarters foster collaboration in a vibrant tech ecosystem, but our hybrid/remote model prioritizes results over seats. Join a team of ex-Google, Meta, and DeepMind engineers obsessed with excellence. OpenAI's mission-driven culture attracts diverse talent united by curiosity and impact.

How to Apply

Ready to accelerate AGI research? Submit your resume, GitHub/portfolio, and a brief note on your favorite distributed systems project. Our process includes technical screens, systems design, and team interviews. We're committed to diversity—no discrimination based on protected characteristics. OpenAI is an equal opportunity employer building AI for everyone.

Word count: 1,856

Locations

San Francisco, California, United States
United States (Remote)

Salary

Estimated Salary Rangehigh confidence

231,000 - 418,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Pythonintermediate
Rustintermediate
Distributed Systemsintermediate
Kubernetesintermediate
Kafkaintermediate
Sparkintermediate
Trinointermediate
Prestointermediate
Icebergintermediate
Terraformintermediate
Helmintermediate
Backend Developmentintermediate
Data Processingintermediate
Observabilityintermediate
Performance Engineeringintermediate
ML Workflowsintermediate
Streaming Dataintermediate
Batch Processingintermediate
Infrastructure as Codeintermediate
On-Call Rotationintermediate
Microservicesintermediate

Required Qualifications

Strong proficiency in Python and Rust for backend software development in large codebases (experience)
Hands-on experience with distributed systems and scalable data processing infrastructure (experience)
Deep knowledge of technologies like Kafka, Spark, Trino/Presto, and Apache Iceberg (experience)
Proven experience operating services in Kubernetes environments (experience)
Familiarity with infrastructure tools including Terraform and Helm (experience)
Ability to work across the full stack from low-level infrastructure to application logic (experience)
Track record of making pragmatic trade-offs to deliver quickly in fast-paced environments (experience)
Strong focus on building reliable, user-friendly systems for researchers and engineers (experience)
Experience debugging and optimizing performance of production services (experience)
Comfort with on-call rotations and responding to critical production incidents (experience)
Adaptability and curiosity in high-growth, rapidly changing organizations (experience)
Bachelor's or higher degree in Computer Science, Engineering, or related field preferred (experience)

Responsibilities

Design and build scalable backend systems supporting ML research workflows including observability and analytics
Develop reliable infrastructure for both streaming and batch data processing at massive scale
Create internal-facing tools and custom applications to empower research teams
Debug and optimize performance of services running on Kubernetes clusters
Build and maintain operational tooling for monitoring and alerting
Implement comprehensive observability solutions for distributed systems
Collaborate closely with ML engineers and researchers to understand and meet production needs
Participate in on-call rotation to ensure high system reliability and quick incident response
Improve system reliability through proactive monitoring and post-mortem analysis
Leverage infrastructure-as-code practices with Terraform and Helm for deployments
Integrate data pipelines using Kafka, Spark, Trino, and Iceberg for analytics workloads
Contribute to performance engineering initiatives across OpenAI's research infrastructure
Document systems and create developer guides for seamless team adoption
Stay ahead of scaling challenges as research workloads grow exponentially

Benefits

general: Competitive salary with equity package and performance bonuses
general: Comprehensive medical, dental, and vision insurance coverage
general: 401(k) retirement plan with generous company matching
general: Unlimited PTO policy with encouraged recharge periods
general: Hybrid work model: 3 days in office, 2 days remote flexibility
general: Full relocation assistance for new employees moving to San Francisco
general: Generous parental leave policy for primary and secondary caregivers
general: Fitness stipend and wellness programs including mental health support
general: Catered meals, snacks, and beverages in office daily
general: Learning and development stipend for conferences and courses
general: Commuter benefits and subsidized public transportation
general: Employee stock purchase plan with favorable terms
general: Volunteer time off and charitable donation matching
general: Cutting-edge hardware including latest MacBooks and multi-monitor setups

Target Your Resume for "Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

software engineer infrastructure openaiopenai careers san franciscodistributed systems engineer jobskubernetes engineer openaidata infrastructure engineerpython rust backend developerkafka spark trino jobsai research infrastructure careersml workflows engineer openaisan francisco tech jobs remoteobservability engineer openaiperformance engineering aiterraform helm kubernetes jobsbackend software engineer agiopenai scaling team careersremote infrastructure engineer usanalytics infrastructure openaihigh growth startup engineeringproduction systems reliability jobsiceberg presto data engineeropenai software engineer salaryScaling

Answer 10 quick questions to check your fit for Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Infrastructure - Analytics at OpenAI: Build the Future of AI Research Infrastructure

Role Overview

Key Responsibilities

As a Software Engineer on the Scaling team, you'll wear many hats. Here's what your day-to-day will look like:

Architect scalable backend systems for ML research workflows, focusing on observability, analytics, and performance monitoring.
Build robust infrastructure supporting both streaming (Kafka) and batch (Spark) data processing at massive scale.
Develop internal tools and applications that streamline researcher workflows and boost productivity.
Debug and performance-tune services running on Kubernetes, implementing advanced observability with Prometheus, Grafana, and custom metrics.
Create operational tooling for deployment, monitoring, and incident response using Terraform, Helm, and CI/CD pipelines.
Collaborate cross-functionally with ML engineers, researchers, and product teams to deliver production-ready systems.
Participate in on-call rotations, driving rapid incident resolution and post-mortem improvements for 99.99% uptime.
Optimize data pipelines with Trino/Presto query engines and Iceberg table formats for petabyte-scale analytics.
Implement reliability engineering practices including chaos testing, circuit breakers, and graceful degradation.
Contribute to OpenAI's culture of excellence by mentoring junior engineers and documenting scalable patterns.
Stay ahead of research scaling needs, proactively designing for 10x workload growth.
Leverage Python and Rust to build high-performance services that set new standards for AI infrastructure.
Drive performance engineering initiatives, reducing p99 latencies and resource costs across the stack.

Qualifications

We're looking for versatile engineers who can hit the ground running in our fast-paced environment. You might thrive if you have:

5+ years of professional software engineering experience with strong Python/Rust proficiency in large-scale codebases.
Deep expertise in distributed systems, data infrastructure (Kafka, Spark, Trino/Presto, Iceberg), and cloud-native architectures.
Hands-on Kubernetes operations experience including debugging, scaling, and observability implementation.
Proven track record with IaC tools like Terraform and Helm for managing complex deployments.
Experience across the stack: from kernel-level optimizations to full-stack application development.
Strong systems design skills with focus on reliability, performance, and developer experience.
Comfort in high-growth startups where requirements evolve rapidly and ownership is paramount.
Bonus: Experience in ML infrastructure, large language models, or research computing environments.

Salary & Benefits

Beyond pay, enjoy:

Comprehensive health benefits with low premiums and excellent coverage
Unlimited vacation with a 'recharge guarantee'
Hybrid SF work model + full US remote option
Relocation support including housing and moving expenses
Generous parental leave (16+ weeks)
Learning stipends for conferences like KubeCon and Strange Loop
Daily catered meals and top-tier office perks
Equity in a company transforming humanity's future

Why Join OpenAI?

How to Apply

Word count: 1,856

Locations

San Francisco, California, United States
United States (Remote)

Salary

Estimated Salary Rangehigh confidence

231,000 - 418,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Pythonintermediate
Rustintermediate
Distributed Systemsintermediate
Kubernetesintermediate
Kafkaintermediate
Sparkintermediate
Trinointermediate
Prestointermediate
Icebergintermediate
Terraformintermediate
Helmintermediate
Backend Developmentintermediate
Data Processingintermediate
Observabilityintermediate
Performance Engineeringintermediate
ML Workflowsintermediate
Streaming Dataintermediate
Batch Processingintermediate
Infrastructure as Codeintermediate
On-Call Rotationintermediate
Microservicesintermediate

Required Qualifications

Strong proficiency in Python and Rust for backend software development in large codebases (experience)
Hands-on experience with distributed systems and scalable data processing infrastructure (experience)
Deep knowledge of technologies like Kafka, Spark, Trino/Presto, and Apache Iceberg (experience)
Proven experience operating services in Kubernetes environments (experience)
Familiarity with infrastructure tools including Terraform and Helm (experience)
Ability to work across the full stack from low-level infrastructure to application logic (experience)
Track record of making pragmatic trade-offs to deliver quickly in fast-paced environments (experience)
Strong focus on building reliable, user-friendly systems for researchers and engineers (experience)
Experience debugging and optimizing performance of production services (experience)
Comfort with on-call rotations and responding to critical production incidents (experience)
Adaptability and curiosity in high-growth, rapidly changing organizations (experience)
Bachelor's or higher degree in Computer Science, Engineering, or related field preferred (experience)

Responsibilities

Design and build scalable backend systems supporting ML research workflows including observability and analytics
Develop reliable infrastructure for both streaming and batch data processing at massive scale
Create internal-facing tools and custom applications to empower research teams
Debug and optimize performance of services running on Kubernetes clusters
Build and maintain operational tooling for monitoring and alerting
Implement comprehensive observability solutions for distributed systems
Collaborate closely with ML engineers and researchers to understand and meet production needs
Participate in on-call rotation to ensure high system reliability and quick incident response
Improve system reliability through proactive monitoring and post-mortem analysis
Leverage infrastructure-as-code practices with Terraform and Helm for deployments
Integrate data pipelines using Kafka, Spark, Trino, and Iceberg for analytics workloads
Contribute to performance engineering initiatives across OpenAI's research infrastructure
Document systems and create developer guides for seamless team adoption
Stay ahead of scaling challenges as research workloads grow exponentially

Benefits

general: Competitive salary with equity package and performance bonuses
general: Comprehensive medical, dental, and vision insurance coverage
general: 401(k) retirement plan with generous company matching
general: Unlimited PTO policy with encouraged recharge periods
general: Hybrid work model: 3 days in office, 2 days remote flexibility
general: Full relocation assistance for new employees moving to San Francisco
general: Generous parental leave policy for primary and secondary caregivers
general: Fitness stipend and wellness programs including mental health support
general: Catered meals, snacks, and beverages in office daily
general: Learning and development stipend for conferences and courses
general: Commuter benefits and subsidized public transportation
general: Employee stock purchase plan with favorable terms
general: Volunteer time off and charitable donation matching
general: Cutting-edge hardware including latest MacBooks and multi-monitor setups

Target Your Resume for "Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Software Engineer, Infrastructure - Analytics Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap