RESUME AND JOB

Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Research Engineer, Frontier Evals & Environments at OpenAI - San Francisco

Join OpenAI's Frontier Evals & Environments team and shape the future of safe AGI development. This senior-level Research Engineer role offers you the chance to build north-star evaluation environments that drive progress toward artificial general intelligence (AGI) and artificial superintelligence (ASI). Located in San Francisco, California, this position places you at the forefront of AI safety research.

Role Overview

The Frontier Evals & Environments team at OpenAI is responsible for creating ambitious benchmarks and evaluation frameworks that measure and steer frontier AI models. Our open-sourced evaluations like GDPval, SWE-bench Verified, MLE-bench, PaperBench, and SWE-Lancer have set industry standards. We've conducted frontier evaluations for groundbreaking models including GPT-4o, o1, o3, GPT-4.5, ChatGPT Agent, and GPT-5.

As a Research Engineer, you'll push the boundaries of AI capabilities measurement, owning end-to-end projects that influence training, safety, and launch decisions. This role demands exceptional engineering talent passionate about AGI safety and rapid model progress. Experience firsthand how OpenAI's models evolve and contribute to steering them responsibly.

Working in our dynamic San Francisco office, you'll collaborate with top researchers to build self-improvement loops, RL environments, and scalable evaluation systems. This is your opportunity to make history in AI safety research.

Key Responsibilities

Create Ambitious RL Environments: Design reinforcement learning environments that rigorously test frontier models' limits and capabilities.
Measure Model Capabilities: Develop comprehensive frameworks to evaluate skills, behaviors, and emergent abilities in cutting-edge AI systems.
Innovate Evaluation Methodologies: Pioneer automatic exploration techniques to uncover hidden model behaviors and failure modes.
Steer Frontier Training: Influence training decisions for OpenAI's largest model runs, gaining early access to breakthrough capabilities.
Build Scalable Systems: Architect processes supporting continuous, high-throughput model evaluation at scale.
Implement Self-Improvement Loops: Create automated systems that enhance model understanding and iterative improvement.
Conduct Red-Teaming: Systematically identify vulnerabilities using creative, adversarial testing approaches.
Analyze Evaluation Data: Apply statistical rigor to interpret results and inform critical safety decisions.
Cross-Functional Collaboration: Partner with research, safety, and product teams to operationalize evaluations.
Open-Source Contributions: Publish frameworks that advance the broader AI safety ecosystem.
Monitor Production Systems: Implement observability for real-world model deployments.
Prototype Novel Benchmarks: Rapidly iterate on evaluation environments like SWE-bench and MLE-bench.
Drive Empirical Research: Lead studies spanning the full spectrum of AI capabilities measurement.

Qualifications

Required:

Deep passion for AGI/ASI measurement and AI safety research
Exceptional engineering skills with ML research engineering experience
Strong statistical analysis and experimental design capabilities
Creative problem-solving with robust red-teaming mindset
Hands-on experience in ML research engineering, stochastic systems, LLM applications, or AI evaluations
Proven ability to deliver end-to-end projects in fast-paced environments

Preferred:

First-hand red-teaming experience with complex systems
Cross-functional collaboration success
Excellent technical communication skills

Candidates should thrive in ambiguity, demonstrate ownership, and possess the technical depth to tackle frontier AI challenges.

Salary & Benefits

Competitive Compensation: Total compensation for this senior Research Engineer role ranges from $320,000 - $480,000 USD annually, including base salary, equity, and performance bonuses. Exact figures depend on experience and qualifications.

Comprehensive Benefits Package:

Premium health, dental, vision coverage
401(k) with generous company match
Unlimited PTO policy
Parental leave and family planning support
Relocation package for San Francisco
Weekly catered meals, snacks, and beverages
Onsite fitness center and wellness programs
Professional development stipend
Cutting-edge hardware and AI infrastructure
Equity in OpenAI - shape the future of AI

Why Join OpenAI?

OpenAI leads the world in developing safe AGI that benefits humanity. Our Frontier Evals team directly influences model training for GPT-4o, o1, GPT-5, and beyond. You'll work alongside brilliant minds, publish influential research, and see your evaluations shape AI's trajectory.

San Francisco location offers unparalleled access to AI talent and innovation ecosystem. Experience rapid model progress firsthand and contribute to humanity's most important technical challenge. OpenAI provides resources, autonomy, and impact unmatched in industry.

Join a mission-driven culture prioritizing safety, rapid iteration, and bold ambition. Your work will define AI evaluation standards for generations.

How to Apply

Ready to push AI safety frontiers? Submit your resume, GitHub/portfolio, and a brief note explaining your fit for this Research Engineer role. Highlight relevant ML research, evaluation experience, or red-teaming projects.

OpenAI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Applications reviewed on rolling basis - apply immediately to join our frontier team.

This San Francisco Research Engineer position represents a rare opportunity to work on AGI safety at the world's leading AI lab. Apply now and help steer humanity's AI future.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

336,000 - 528,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Reinforcement Learning (RL)intermediate
Machine Learning Researchintermediate
AI Model Evaluationintermediate
LLM Applicationsintermediate
Statistical Analysisintermediate
Red-Teamingintermediate
Stochastic Systemsintermediate
Observability & Monitoringintermediate
Python Programmingintermediate
Scalable Systems Designintermediate
AGI/ASI Measurementintermediate
RL Environment Developmentintermediate
Model Capabilities Testingintermediate
Automated Evaluation Methodologiesintermediate
Cross-Functional Collaborationintermediate
Frontier Model Trainingintermediate
Self-Improvement Loopsintermediate
High-Performance Computingintermediate

Required Qualifications

Passionate about AGI/ASI measurement and safety (experience)
Strong engineering skills with proven ML research experience (experience)
Expertise in statistical analysis and data interpretation (experience)
Red-teaming mindset with creative problem-solving abilities (experience)
Experience in ML research engineering or related technical domains (experience)
Proficiency in stochastic systems and probabilistic modeling (experience)
Hands-on experience with observability, monitoring, and debugging complex systems (experience)
Deep knowledge of LLM-enabled applications and AI evaluations (experience)
Ability to thrive in dynamic, fast-paced research environments (experience)
Proven track record of scoping and delivering end-to-end projects (experience)
Experience working with frontier AI models and large-scale training runs (experience)
Strong communication skills for cross-functional collaboration (preferred) (experience)

Responsibilities

Design and create ambitious RL environments to test frontier model limits
Develop comprehensive measurement frameworks for model capabilities, skills, and behaviors
Innovate new methodologies for automatic exploration of model behaviors
Contribute to steering training decisions for largest-scale model training runs
Build scalable systems and processes for continuous model evaluation
Implement self-improvement loops to automate model understanding and optimization
Conduct red-teaming exercises to identify model weaknesses and failure modes
Analyze evaluation results to inform safety, training, and deployment decisions
Collaborate with research, safety, and engineering teams on frontier evaluations
Open-source evaluation frameworks and contribute to public AI safety benchmarks
Monitor and observe model performance in production-like environments
Prototype novel evaluation environments like GDPval, SWE-bench, and MLE-bench
Drive empirical research on the full spectrum of AI capabilities measurement

Benefits

general: Competitive salary with equity in leading AI company
general: Comprehensive health, dental, and vision insurance
general: 401(k) matching and retirement planning support
general: Unlimited PTO with encouraged recharge periods
general: Generous parental leave policies
general: Relocation assistance for San Francisco move
general: Weekly catered meals and fully stocked kitchens
general: Onsite gym membership and wellness programs
general: Learning stipend for conferences and courses
general: Direct impact on AGI safety and development
general: Work with world-class researchers and engineers
general: Latest hardware and cutting-edge AI infrastructure
general: Flexible work hours in fast-paced environment
general: Opportunities to publish groundbreaking research

Target Your Resume for "Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Get personalized recommendations to optimize your resume specifically for Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

OpenAI Research Engineer jobsFrontier Evals careers San FranciscoAGI evaluation engineerRL environments AI jobsAI safety research positionsML research engineering OpenAIFrontier model evaluation careersRed-teaming AI specialistSWE-bench research engineerGPT model evaluation jobsSan Francisco AI research jobsASI measurement engineerLLM evaluation specialistOpenAI San Francisco careersAI capabilities testing jobsStochastic systems ML engineerSelf-improvement loops AIFrontier AI training researchOpenAI eval engineer salaryResearch Engineer AGI safetyModel evaluation frameworks jobsPython ML research OpenAIResearch

Answer 10 quick questions to check your fit for Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!

OpenAI

full-timePosted: Feb 10, 2026

Job Description

Research Engineer, Frontier Evals & Environments at OpenAI - San Francisco

Role Overview

Key Responsibilities

Create Ambitious RL Environments: Design reinforcement learning environments that rigorously test frontier models' limits and capabilities.
Measure Model Capabilities: Develop comprehensive frameworks to evaluate skills, behaviors, and emergent abilities in cutting-edge AI systems.
Innovate Evaluation Methodologies: Pioneer automatic exploration techniques to uncover hidden model behaviors and failure modes.
Steer Frontier Training: Influence training decisions for OpenAI's largest model runs, gaining early access to breakthrough capabilities.
Build Scalable Systems: Architect processes supporting continuous, high-throughput model evaluation at scale.
Implement Self-Improvement Loops: Create automated systems that enhance model understanding and iterative improvement.
Conduct Red-Teaming: Systematically identify vulnerabilities using creative, adversarial testing approaches.
Analyze Evaluation Data: Apply statistical rigor to interpret results and inform critical safety decisions.
Cross-Functional Collaboration: Partner with research, safety, and product teams to operationalize evaluations.
Open-Source Contributions: Publish frameworks that advance the broader AI safety ecosystem.
Monitor Production Systems: Implement observability for real-world model deployments.
Prototype Novel Benchmarks: Rapidly iterate on evaluation environments like SWE-bench and MLE-bench.
Drive Empirical Research: Lead studies spanning the full spectrum of AI capabilities measurement.

Qualifications

Required:

Deep passion for AGI/ASI measurement and AI safety research
Exceptional engineering skills with ML research engineering experience
Strong statistical analysis and experimental design capabilities
Creative problem-solving with robust red-teaming mindset
Hands-on experience in ML research engineering, stochastic systems, LLM applications, or AI evaluations
Proven ability to deliver end-to-end projects in fast-paced environments

Preferred:

First-hand red-teaming experience with complex systems
Cross-functional collaboration success
Excellent technical communication skills

Candidates should thrive in ambiguity, demonstrate ownership, and possess the technical depth to tackle frontier AI challenges.

Salary & Benefits

Comprehensive Benefits Package:

Premium health, dental, vision coverage
401(k) with generous company match
Unlimited PTO policy
Parental leave and family planning support
Relocation package for San Francisco
Weekly catered meals, snacks, and beverages
Onsite fitness center and wellness programs
Professional development stipend
Cutting-edge hardware and AI infrastructure
Equity in OpenAI - shape the future of AI

Why Join OpenAI?

Join a mission-driven culture prioritizing safety, rapid iteration, and bold ambition. Your work will define AI evaluation standards for generations.

How to Apply

This San Francisco Research Engineer position represents a rare opportunity to work on AGI safety at the world's leading AI lab. Apply now and help steer humanity's AI future.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangehigh confidence

336,000 - 528,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Reinforcement Learning (RL)intermediate
Machine Learning Researchintermediate
AI Model Evaluationintermediate
LLM Applicationsintermediate
Statistical Analysisintermediate
Red-Teamingintermediate
Stochastic Systemsintermediate
Observability & Monitoringintermediate
Python Programmingintermediate
Scalable Systems Designintermediate
AGI/ASI Measurementintermediate
RL Environment Developmentintermediate
Model Capabilities Testingintermediate
Automated Evaluation Methodologiesintermediate
Cross-Functional Collaborationintermediate
Frontier Model Trainingintermediate
Self-Improvement Loopsintermediate
High-Performance Computingintermediate

Required Qualifications

Passionate about AGI/ASI measurement and safety (experience)
Strong engineering skills with proven ML research experience (experience)
Expertise in statistical analysis and data interpretation (experience)
Red-teaming mindset with creative problem-solving abilities (experience)
Experience in ML research engineering or related technical domains (experience)
Proficiency in stochastic systems and probabilistic modeling (experience)
Hands-on experience with observability, monitoring, and debugging complex systems (experience)
Deep knowledge of LLM-enabled applications and AI evaluations (experience)
Ability to thrive in dynamic, fast-paced research environments (experience)
Proven track record of scoping and delivering end-to-end projects (experience)
Experience working with frontier AI models and large-scale training runs (experience)
Strong communication skills for cross-functional collaboration (preferred) (experience)

Responsibilities

Design and create ambitious RL environments to test frontier model limits
Develop comprehensive measurement frameworks for model capabilities, skills, and behaviors
Innovate new methodologies for automatic exploration of model behaviors
Contribute to steering training decisions for largest-scale model training runs
Build scalable systems and processes for continuous model evaluation
Implement self-improvement loops to automate model understanding and optimization
Conduct red-teaming exercises to identify model weaknesses and failure modes
Analyze evaluation results to inform safety, training, and deployment decisions
Collaborate with research, safety, and engineering teams on frontier evaluations
Open-source evaluation frameworks and contribute to public AI safety benchmarks
Monitor and observe model performance in production-like environments
Prototype novel evaluation environments like GDPval, SWE-bench, and MLE-bench
Drive empirical research on the full spectrum of AI capabilities measurement

Benefits

general: Competitive salary with equity in leading AI company
general: Comprehensive health, dental, and vision insurance
general: 401(k) matching and retirement planning support
general: Unlimited PTO with encouraged recharge periods
general: Generous parental leave policies
general: Relocation assistance for San Francisco move
general: Weekly catered meals and fully stocked kitchens
general: Onsite gym membership and wellness programs
general: Learning stipend for conferences and courses
general: Direct impact on AGI safety and development
general: Work with world-class researchers and engineers
general: Latest hardware and cutting-edge AI infrastructure
general: Flexible work hours in fast-paced environment
general: Opportunities to publish groundbreaking research

Target Your Resume for "Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now!" , OpenAI

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Research Engineer, Frontier Evals & Environments Careers at OpenAI - San Francisco, California | Apply Now! @ OpenAI.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap