Resume and JobRESUME AND JOB
OpenAI logo

Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Frontier Systems at OpenAI - San Francisco, CA

Join OpenAI's Frontier Systems team and build the backbone of the world's most powerful AI supercomputers. This senior-level Software Engineer role in San Francisco offers a chance to own critical infrastructure that powers cutting-edge model training. If you have 7+ years in software engineering, expertise in Python, shell scripting, and data analysis tools like SQL and Pandas, apply now to make a real impact on humanity's AI future.

Role Overview

The Frontier Systems team at OpenAI is at the forefront of AI infrastructure innovation. We design, build, launch, and maintain the largest supercomputers on the planet, purpose-built for training OpenAI's frontier AI models. As a Software Engineer on this team, you'll transform data center blueprints into reliable, high-performance systems capable of running uninterrupted training runs for the most advanced AI systems.

Your work directly supports OpenAI's mission to develop safe artificial general intelligence (AGI) that benefits all of humanity. Even a single hardware failure can cost weeks of compute time and millions in resources, so reliability is paramount. Engineers here are trusted with full ownership—from diagnosing complex system issues to deploying automation that scales across thousands of GPUs and nodes.

This isn't just ops work; it's systems engineering at the bleeding edge. You'll dive deep into root causes, build tools that prevent failures proactively, and collaborate with top AI researchers to keep training pipelines humming 24/7. No prior hardware experience required—we'll teach you the low-level details like PCIe protocols, Infiniband networking, and kernel tuning as you go.

Based in San Francisco, this role offers the excitement of working on infrastructure that pushes the boundaries of what's possible in AI compute.

Key Responsibilities

In this high-impact role, you'll take end-to-end ownership of the systems that power frontier model training. Here's what you'll do daily:

  • Own system health checks that ensure hyperscale supercomputers remain stable during multi-week training runs.
  • Lead investigations into hardware failures, analyzing petabytes of telemetry to pinpoint root causes.
  • Build Python-based automation to monitor and remediate issues across 10,000+ machines in real-time.
  • Dig into noisy logs and metrics using SQL queries, PromQL, and Pandas for reproducible insights.
  • Develop shell scripts and tools for low-level hardware interactions, like power cycling nodes or diagnosing Infiniband link flaps.
  • Optimize kernel parameters for peak GPU utilization and minimal latency in distributed training.
  • Create dashboards and visualizations for data center-wide health monitoring.
  • Collaborate with hardware vendors to resolve systemic issues at exascale.
  • Design failover logic to handle node failures without interrupting model training.
  • Scale monitoring systems as clusters grow from thousands to tens of thousands of GPUs.
  • Document failure modes and build preventive automation to eliminate them.
  • Support 24/7 on-call rotations with a focus on minimizing researcher downtime.
  • Contribute to open-source tools for large-scale systems management (where possible).

Expect to wear multiple hats: builder, detective, and optimizer—all while shipping code that runs the future of AI.

Qualifications

We're looking for senior engineers who thrive in ambiguity and scale. Required:

  • 7+ years of software engineering experience in production environments.
  • Expert Python and shell scripting for automation and tooling.
  • Proven ability to wrangle noisy data with SQL, PromQL, Pandas, or similar.
  • Track record of building reproducible analyses that drive decisions.
  • Balanced skills in software development and operational reliability.

Bonus points for:

  • Hands-on experience with hardware internals (PCIe, Infiniband, networking).
  • Data center visualization and monitoring expertise.
  • Network ops, power management, or Linux kernel tuning.

Prior supercomputing or AI infra experience is a plus, but not required. Strong systems thinkers from cloud, HPC, or large-scale web services will excel here.

Salary & Benefits

Competitive total compensation for senior Software Engineers in San Francisco ranges from $250,000 to $450,000 base, plus equity and bonuses. OpenAI offers one of the best packages in tech:

  • Top-tier medical, dental, vision coverage.
  • 401(k) with 4%+ match.
  • Unlimited vacation and flexible hours.
  • 16+ weeks parental leave.
  • Wellness stipends and gym reimbursements.
  • Relocation support for SF move.
  • Stock options in a unicorn shaping AGI.

Full details shared during interviews.

Why Join OpenAI?

OpenAI isn't just another tech company—we're building AGI to benefit humanity. Your code will run on supercomputers training models that could solve climate change, cure diseases, and accelerate scientific discovery. Join a mission-driven team of the world's best, with a culture that values impact over bureaucracy.

San Francisco HQ offers vibrant collaboration, stocked kitchens, and events. We're equal opportunity employers committed to diversity.

How to Apply

Submit your resume and a note on why you're excited about Frontier Systems. Interviews include technical deep dives, systems debugging, and team fit. No agencies, please.

Apply now and power the next era of AI!

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Python programmingintermediate
  • Shell scriptingintermediate
  • SQL queryingintermediate
  • PromQLintermediate
  • Pandas data analysisintermediate
  • Linux systems administrationintermediate
  • Hardware troubleshootingintermediate
  • System health monitoringintermediate
  • Automation scriptingintermediate
  • Root cause analysisintermediate
  • Data center operationsintermediate
  • Network protocols (PCIe, Infiniband)intermediate
  • Power managementintermediate
  • Kernel performance tuningintermediate
  • Reproducible analysesintermediate
  • Large-scale infrastructureintermediate
  • Visualization toolsintermediate
  • Network operationsintermediate
  • Hyperscale computingintermediate
  • AI model training supportintermediate

Required Qualifications

  • 7+ years of industry experience in software engineering (experience)
  • Proficiency with Python and shell scripting for automation (experience)
  • High comfort level digging into noisy data using SQL, PromQL, and Pandas (experience)
  • Experience developing reproducible analyses for system diagnostics (experience)
  • Strong balance of building software and operationalizing infrastructure (experience)
  • Ability to own end-to-end system health checks for hyperscale supercomputers (experience)
  • Expertise in leading deep dives into hardware failures and system-level bugs (experience)
  • Comfort with low-level hardware details like PCIe, Infiniband, and networking (bonus) (experience)
  • Experience with Linux tooling for power management and kernel perf tuning (bonus) (experience)
  • Familiarity with visualization of large data centers and networks (bonus) (experience)
  • Proven track record in network operations and tooling (bonus) (experience)
  • Passion for stabilizing systems during cutting-edge AI model training (experience)

Responsibilities

  • Own and continuously improve system health checks for hyperscale supercomputers
  • Lead deep-dive investigations into hardware failures at massive scale
  • Perform root cause analysis on system-level bugs impacting model training
  • Build and deploy automation tools to monitor thousands of machines
  • Develop scripts to automatically detect and fix issues without human intervention
  • Analyze noisy telemetry data using SQL, PromQL, and Python Pandas
  • Create reproducible analyses to document and prevent recurring failures
  • Collaborate with researchers to minimize disruptions during frontier model training
  • Optimize power management and stabilization across data center infrastructure
  • Tune Linux kernel performance for high-efficiency supercomputing workloads
  • Visualize and monitor large-scale data center networks and hardware states
  • Design and implement failover mechanisms for critical training infrastructure
  • Support the launch and scaling of the world's largest AI supercomputers
  • Integrate hardware protocols like PCIe and Infiniband into monitoring systems

Benefits

  • general: Comprehensive health, dental, and vision insurance plans
  • general: 401(k) retirement savings with generous company matching
  • general: Unlimited PTO policy to promote work-life balance
  • general: Generous parental leave for new parents
  • general: Mental health support through partnered counseling services
  • general: Fitness reimbursement and wellness stipends
  • general: Fully stocked kitchens with healthy snacks and meals
  • general: Learning and development stipend for conferences and courses
  • general: Equity stock options in a high-growth AI company
  • general: Commuter benefits for San Francisco public transit
  • general: Relocation assistance for out-of-state candidates
  • general: Team offsites and social events to build camaraderie
  • general: Cutting-edge work on world's largest supercomputers
  • general: Mission-driven culture focused on safe AGI for humanity

Target Your Resume for "Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

software engineer openaifrontier systems openaiopenai careers san franciscohyperscale supercomputer engineerai infrastructure jobspython systems engineer openaidata center automation engineersenior software engineer aiopenai frontier model traininglinux kernel tuning jobsinfiniband networking engineerpcie hardware troubleshootingpromql sql pandas jobsroot cause analysis ai infrapower management data centersoftware engineer supercomputeropenai san francisco jobsai research infrastructurelarge scale systems engineeropenai software engineer salaryhyperscale computing careersagi infrastructure rolesScaling

Answer 10 quick questions to check your fit for Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

OpenAI logo

Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!

full-timePosted: Feb 10, 2026

Job Description

Software Engineer, Frontier Systems at OpenAI - San Francisco, CA

Join OpenAI's Frontier Systems team and build the backbone of the world's most powerful AI supercomputers. This senior-level Software Engineer role in San Francisco offers a chance to own critical infrastructure that powers cutting-edge model training. If you have 7+ years in software engineering, expertise in Python, shell scripting, and data analysis tools like SQL and Pandas, apply now to make a real impact on humanity's AI future.

Role Overview

The Frontier Systems team at OpenAI is at the forefront of AI infrastructure innovation. We design, build, launch, and maintain the largest supercomputers on the planet, purpose-built for training OpenAI's frontier AI models. As a Software Engineer on this team, you'll transform data center blueprints into reliable, high-performance systems capable of running uninterrupted training runs for the most advanced AI systems.

Your work directly supports OpenAI's mission to develop safe artificial general intelligence (AGI) that benefits all of humanity. Even a single hardware failure can cost weeks of compute time and millions in resources, so reliability is paramount. Engineers here are trusted with full ownership—from diagnosing complex system issues to deploying automation that scales across thousands of GPUs and nodes.

This isn't just ops work; it's systems engineering at the bleeding edge. You'll dive deep into root causes, build tools that prevent failures proactively, and collaborate with top AI researchers to keep training pipelines humming 24/7. No prior hardware experience required—we'll teach you the low-level details like PCIe protocols, Infiniband networking, and kernel tuning as you go.

Based in San Francisco, this role offers the excitement of working on infrastructure that pushes the boundaries of what's possible in AI compute.

Key Responsibilities

In this high-impact role, you'll take end-to-end ownership of the systems that power frontier model training. Here's what you'll do daily:

  • Own system health checks that ensure hyperscale supercomputers remain stable during multi-week training runs.
  • Lead investigations into hardware failures, analyzing petabytes of telemetry to pinpoint root causes.
  • Build Python-based automation to monitor and remediate issues across 10,000+ machines in real-time.
  • Dig into noisy logs and metrics using SQL queries, PromQL, and Pandas for reproducible insights.
  • Develop shell scripts and tools for low-level hardware interactions, like power cycling nodes or diagnosing Infiniband link flaps.
  • Optimize kernel parameters for peak GPU utilization and minimal latency in distributed training.
  • Create dashboards and visualizations for data center-wide health monitoring.
  • Collaborate with hardware vendors to resolve systemic issues at exascale.
  • Design failover logic to handle node failures without interrupting model training.
  • Scale monitoring systems as clusters grow from thousands to tens of thousands of GPUs.
  • Document failure modes and build preventive automation to eliminate them.
  • Support 24/7 on-call rotations with a focus on minimizing researcher downtime.
  • Contribute to open-source tools for large-scale systems management (where possible).

Expect to wear multiple hats: builder, detective, and optimizer—all while shipping code that runs the future of AI.

Qualifications

We're looking for senior engineers who thrive in ambiguity and scale. Required:

  • 7+ years of software engineering experience in production environments.
  • Expert Python and shell scripting for automation and tooling.
  • Proven ability to wrangle noisy data with SQL, PromQL, Pandas, or similar.
  • Track record of building reproducible analyses that drive decisions.
  • Balanced skills in software development and operational reliability.

Bonus points for:

  • Hands-on experience with hardware internals (PCIe, Infiniband, networking).
  • Data center visualization and monitoring expertise.
  • Network ops, power management, or Linux kernel tuning.

Prior supercomputing or AI infra experience is a plus, but not required. Strong systems thinkers from cloud, HPC, or large-scale web services will excel here.

Salary & Benefits

Competitive total compensation for senior Software Engineers in San Francisco ranges from $250,000 to $450,000 base, plus equity and bonuses. OpenAI offers one of the best packages in tech:

  • Top-tier medical, dental, vision coverage.
  • 401(k) with 4%+ match.
  • Unlimited vacation and flexible hours.
  • 16+ weeks parental leave.
  • Wellness stipends and gym reimbursements.
  • Relocation support for SF move.
  • Stock options in a unicorn shaping AGI.

Full details shared during interviews.

Why Join OpenAI?

OpenAI isn't just another tech company—we're building AGI to benefit humanity. Your code will run on supercomputers training models that could solve climate change, cure diseases, and accelerate scientific discovery. Join a mission-driven team of the world's best, with a culture that values impact over bureaucracy.

San Francisco HQ offers vibrant collaboration, stocked kitchens, and events. We're equal opportunity employers committed to diversity.

How to Apply

Submit your resume and a note on why you're excited about Frontier Systems. Interviews include technical deep dives, systems debugging, and team fit. No agencies, please.

Apply now and power the next era of AI!

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

262,500 - 495,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Python programmingintermediate
  • Shell scriptingintermediate
  • SQL queryingintermediate
  • PromQLintermediate
  • Pandas data analysisintermediate
  • Linux systems administrationintermediate
  • Hardware troubleshootingintermediate
  • System health monitoringintermediate
  • Automation scriptingintermediate
  • Root cause analysisintermediate
  • Data center operationsintermediate
  • Network protocols (PCIe, Infiniband)intermediate
  • Power managementintermediate
  • Kernel performance tuningintermediate
  • Reproducible analysesintermediate
  • Large-scale infrastructureintermediate
  • Visualization toolsintermediate
  • Network operationsintermediate
  • Hyperscale computingintermediate
  • AI model training supportintermediate

Required Qualifications

  • 7+ years of industry experience in software engineering (experience)
  • Proficiency with Python and shell scripting for automation (experience)
  • High comfort level digging into noisy data using SQL, PromQL, and Pandas (experience)
  • Experience developing reproducible analyses for system diagnostics (experience)
  • Strong balance of building software and operationalizing infrastructure (experience)
  • Ability to own end-to-end system health checks for hyperscale supercomputers (experience)
  • Expertise in leading deep dives into hardware failures and system-level bugs (experience)
  • Comfort with low-level hardware details like PCIe, Infiniband, and networking (bonus) (experience)
  • Experience with Linux tooling for power management and kernel perf tuning (bonus) (experience)
  • Familiarity with visualization of large data centers and networks (bonus) (experience)
  • Proven track record in network operations and tooling (bonus) (experience)
  • Passion for stabilizing systems during cutting-edge AI model training (experience)

Responsibilities

  • Own and continuously improve system health checks for hyperscale supercomputers
  • Lead deep-dive investigations into hardware failures at massive scale
  • Perform root cause analysis on system-level bugs impacting model training
  • Build and deploy automation tools to monitor thousands of machines
  • Develop scripts to automatically detect and fix issues without human intervention
  • Analyze noisy telemetry data using SQL, PromQL, and Python Pandas
  • Create reproducible analyses to document and prevent recurring failures
  • Collaborate with researchers to minimize disruptions during frontier model training
  • Optimize power management and stabilization across data center infrastructure
  • Tune Linux kernel performance for high-efficiency supercomputing workloads
  • Visualize and monitor large-scale data center networks and hardware states
  • Design and implement failover mechanisms for critical training infrastructure
  • Support the launch and scaling of the world's largest AI supercomputers
  • Integrate hardware protocols like PCIe and Infiniband into monitoring systems

Benefits

  • general: Comprehensive health, dental, and vision insurance plans
  • general: 401(k) retirement savings with generous company matching
  • general: Unlimited PTO policy to promote work-life balance
  • general: Generous parental leave for new parents
  • general: Mental health support through partnered counseling services
  • general: Fitness reimbursement and wellness stipends
  • general: Fully stocked kitchens with healthy snacks and meals
  • general: Learning and development stipend for conferences and courses
  • general: Equity stock options in a high-growth AI company
  • general: Commuter benefits for San Francisco public transit
  • general: Relocation assistance for out-of-state candidates
  • general: Team offsites and social events to build camaraderie
  • general: Cutting-edge work on world's largest supercomputers
  • general: Mission-driven culture focused on safe AGI for humanity

Target Your Resume for "Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

software engineer openaifrontier systems openaiopenai careers san franciscohyperscale supercomputer engineerai infrastructure jobspython systems engineer openaidata center automation engineersenior software engineer aiopenai frontier model traininglinux kernel tuning jobsinfiniband networking engineerpcie hardware troubleshootingpromql sql pandas jobsroot cause analysis ai infrapower management data centersoftware engineer supercomputeropenai san francisco jobsai research infrastructurelarge scale systems engineeropenai software engineer salaryhyperscale computing careersagi infrastructure rolesScaling

Answer 10 quick questions to check your fit for Software Engineer, Frontier Systems Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.