RESUME AND JOB

Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!

Rivian

Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!

Rivian

full-timePosted: Feb 4, 2025

Job Description

Staff Software Engineer, ML Training and Inference Infrastructure at Rivian

Role Overview

Join Rivian's Perception team as a Staff Software Engineer, ML Training and Inference Infrastructure in Palo Alto, California. Rivian is revolutionizing the automotive industry with emissions-free Electric Adventure Vehicles and cutting-edge self-driving technology. This role is at the heart of developing advanced machine learning algorithms that power safety-critical autonomous features in our category-defining vehicles.

As a key member of the team, you'll establish state-of-the-art ML infrastructure for training and inference of large autonomous driving models. Your expertise in optimizing NVIDIA GPU systems, PyTorch, and transformer architectures will directly impact vehicle performance and safety. If you have a passion for scaling deep learning workloads and accelerating inference on edge devices, this is your opportunity to shape the future of autonomous mobility at Rivian.

Key Responsibilities at Rivian

In this high-impact role, you'll tackle complex challenges in ML systems engineering. Here are the core responsibilities:

Optimize Deep Learning training workloads on large-scale NVIDIA GPU clusters for maximum efficiency.
Minimize model inference latency including pre- and post-processing on resource-constrained onboard systems.
Design, train, and deploy massive deep learning models leveraging petabytes of labeled and unlabeled driving data.
Accelerate transformer model training and inference using advanced optimization techniques.
Lead large-scale distributed training initiatives across multi-node GPU environments.
Profile and debug models to identify and eliminate performance bottlenecks.
Develop custom CUDA kernels and Triton ops for specialized acceleration needs.
Collaborate cross-functionally with perception, autonomy, and hardware teams to deliver production-ready systems.
Stay ahead of ML infrastructure trends to maintain Rivian's competitive edge in autonomous driving.

Qualifications & Requirements

Rivian seeks candidates with exceptional technical depth and proven impact in ML systems. Required qualifications include:

PhD in CS/CE/EE or equivalent industry experience building production ML systems.
Deep PyTorch expertise including advanced features and custom extensions.
Hands-on experience with training frameworks like PyTorch Lightning, Ray Train, or similar.
Transformer architecture mastery with proven acceleration techniques (FlashAttention, quantization, etc.).
Distributed training experience at scale (100+ GPUs) with frameworks like DeepSpeed or FSDP.
Proven model profiling skills using NVIDIA Nsight, PyTorch Profiler, and custom instrumentation.

Preferred skills include CUDA/Triton programming, NVIDIA TensorRT optimization, NCCL expertise, and edge computing experience with Jetson/Orin platforms.

Salary & Benefits

Salary Range: $228,000 - $285,000 USD (California-based applicants), determined by experience, skills, and other factors permitted by law.

Comprehensive health benefits: Medical, Rx, dental, vision coverage starting day one.
Family coverage: Includes spouses/domestic partners and children up to age 26.
Equal opportunity employer with ADA accommodations available.
Cutting-edge tech stack: Work with latest NVIDIA hardware and ML frameworks.
Mission-driven culture: Contribute to sustainable transportation and outdoor preservation.

Why Join Rivian?

Rivian isn't just building electric vehicles—we're redefining adventure. Our team shares a deep love for the outdoors and commitment to protecting it through innovative technology. As a Staff Software Engineer, you'll:

Work on safety-critical autonomy systems that millions will trust.
Access world-class compute resources and petabytes of real-world driving data.
Collaborate with top talent from FAANG, Tesla, Waymo, and leading research institutions.
Enjoy Palo Alto location in the heart of Silicon Valley innovation.
Make a tangible impact on emissions-free mobility and autonomous safety.

Rivian's diverse team constantly challenges conventional thinking, reframing old problems with fresh solutions. If you thrive in unknown territory and want to build the future of autonomous adventure vehicles, Rivian is your destination.

How to Apply

Ready to accelerate Rivian's autonomy revolution? Apply now for the Staff Software Engineer, ML Training and Inference Infrastructure position in Palo Alto, CA. Submit your resume highlighting PyTorch expertise, distributed training experience, and GPU optimization projects. Email candidateaccommodations@rivian.com for disability accommodations.

Don't miss this opportunity to join a mission-driven company at the forefront of EV autonomy. Positions like this fill quickly—apply today and start shaping the future of self-driving adventure vehicles!

Locations

Palo Alto, California, United States

Salary

474,240,000 - 592,800,000 USD / yearly

Estimated Salary Rangehigh confidence

474,240,000 - 652,080,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

PyTorchintermediate
Deep Learningintermediate
NVIDIA GPU Optimizationintermediate
Distributed Trainingintermediate
Transformer Modelsintermediate
Model Inferenceintermediate
CUDA Programmingintermediate
NVIDIA TensorRTintermediate
NCCLintermediate
Triton Inference Serverintermediate
Large Scale ML Trainingintermediate
Edge Computingintermediate
Autonomous Driving MLintermediate
Model Profilingintermediate
PyTorch Lightningintermediate

Required Qualifications

PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent industry experience (experience)
Deep knowledge of PyTorch framework and its advanced features (experience)
Expertise in model training frameworks like PyTorch Lightning, Ray, or similar (experience)
In-depth understanding of transformer architecture and acceleration techniques (experience)
Proven experience with large-scale distributed training of deep learning models (experience)
Strong track record in profiling models and optimizing training/inference speed (experience)

Responsibilities

Optimize Deep Learning training workloads on large-scale NVIDIA GPU systems
Reduce model inference latency and pre/post-processing on onboard vehicle systems
Design, train, and deploy large-scale deep learning models for autonomous driving
Leverage vast labeled and unlabeled data for model development
Implement performance optimizations for transformer-based models
Conduct model profiling and detective work to boost training speed
Collaborate with Perception team on safety-critical self-driving features
Develop custom ops using CUDA or Triton for specialized acceleration
Scale distributed training infrastructure for production autonomy workloads

Benefits

general: Competitive salary range of $228,000 - $285,000 for California applicants
general: Robust medical, Rx, dental, and vision insurance packages
general: Coverage for full-time employees, spouses/domestic partners, and children up to age 26
general: Benefits effective on the first day of employment
general: Equal opportunity employment with comprehensive accommodations
general: Work on cutting-edge autonomous driving technology
general: Collaborative team environment with diverse backgrounds
general: Mission-driven culture focused on protecting the outdoors
general: Access to state-of-the-art GPU clusters and edge computing systems

Target Your Resume for "Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!" , Rivian

Get personalized recommendations to optimize your resume specifically for Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!" , Rivian

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

ML EngineerDeep LearningAutonomous DrivingPyTorchNVIDIA GPUDistributed TrainingTransformer ModelsCUDAInference OptimizationRivian JobsStaff Software Engineer ML Training RivianML Inference Infrastructure jobs Palo AltoPyTorch engineer autonomous drivingNVIDIA GPU optimization careersDistributed training engineer RivianTransformer model acceleration jobsAutonomous vehicle ML engineerCUDA Triton developer RivianNVIDIA TensorRT jobs CaliforniaDeep learning infrastructure engineerSelf-driving car ML jobsLarge scale ML training careersEdge computing inference engineerRivian perception team jobsStaff ML engineer Palo AltoAutonomy software engineer salary

Answer 10 quick questions to check your fit for Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now! @ Rivian.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!

Rivian

Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!

Rivian

full-timePosted: Feb 4, 2025

Job Description

Staff Software Engineer, ML Training and Inference Infrastructure at Rivian

Role Overview

Key Responsibilities at Rivian

In this high-impact role, you'll tackle complex challenges in ML systems engineering. Here are the core responsibilities:

Optimize Deep Learning training workloads on large-scale NVIDIA GPU clusters for maximum efficiency.
Minimize model inference latency including pre- and post-processing on resource-constrained onboard systems.
Design, train, and deploy massive deep learning models leveraging petabytes of labeled and unlabeled driving data.
Accelerate transformer model training and inference using advanced optimization techniques.
Lead large-scale distributed training initiatives across multi-node GPU environments.
Profile and debug models to identify and eliminate performance bottlenecks.
Develop custom CUDA kernels and Triton ops for specialized acceleration needs.
Collaborate cross-functionally with perception, autonomy, and hardware teams to deliver production-ready systems.
Stay ahead of ML infrastructure trends to maintain Rivian's competitive edge in autonomous driving.

Qualifications & Requirements

Rivian seeks candidates with exceptional technical depth and proven impact in ML systems. Required qualifications include:

PhD in CS/CE/EE or equivalent industry experience building production ML systems.
Deep PyTorch expertise including advanced features and custom extensions.
Hands-on experience with training frameworks like PyTorch Lightning, Ray Train, or similar.
Transformer architecture mastery with proven acceleration techniques (FlashAttention, quantization, etc.).
Distributed training experience at scale (100+ GPUs) with frameworks like DeepSpeed or FSDP.
Proven model profiling skills using NVIDIA Nsight, PyTorch Profiler, and custom instrumentation.

Preferred skills include CUDA/Triton programming, NVIDIA TensorRT optimization, NCCL expertise, and edge computing experience with Jetson/Orin platforms.

Salary & Benefits

Salary Range: $228,000 - $285,000 USD (California-based applicants), determined by experience, skills, and other factors permitted by law.

Comprehensive health benefits: Medical, Rx, dental, vision coverage starting day one.
Family coverage: Includes spouses/domestic partners and children up to age 26.
Equal opportunity employer with ADA accommodations available.
Cutting-edge tech stack: Work with latest NVIDIA hardware and ML frameworks.
Mission-driven culture: Contribute to sustainable transportation and outdoor preservation.

Why Join Rivian?

Work on safety-critical autonomy systems that millions will trust.
Access world-class compute resources and petabytes of real-world driving data.
Collaborate with top talent from FAANG, Tesla, Waymo, and leading research institutions.
Enjoy Palo Alto location in the heart of Silicon Valley innovation.
Make a tangible impact on emissions-free mobility and autonomous safety.

How to Apply

Locations

Palo Alto, California, United States

Salary

474,240,000 - 592,800,000 USD / yearly

Estimated Salary Rangehigh confidence

474,240,000 - 652,080,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

PyTorchintermediate
Deep Learningintermediate
NVIDIA GPU Optimizationintermediate
Distributed Trainingintermediate
Transformer Modelsintermediate
Model Inferenceintermediate
CUDA Programmingintermediate
NVIDIA TensorRTintermediate
NCCLintermediate
Triton Inference Serverintermediate
Large Scale ML Trainingintermediate
Edge Computingintermediate
Autonomous Driving MLintermediate
Model Profilingintermediate
PyTorch Lightningintermediate

Required Qualifications

PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent industry experience (experience)
Deep knowledge of PyTorch framework and its advanced features (experience)
Expertise in model training frameworks like PyTorch Lightning, Ray, or similar (experience)
In-depth understanding of transformer architecture and acceleration techniques (experience)
Proven experience with large-scale distributed training of deep learning models (experience)
Strong track record in profiling models and optimizing training/inference speed (experience)

Responsibilities

Optimize Deep Learning training workloads on large-scale NVIDIA GPU systems
Reduce model inference latency and pre/post-processing on onboard vehicle systems
Design, train, and deploy large-scale deep learning models for autonomous driving
Leverage vast labeled and unlabeled data for model development
Implement performance optimizations for transformer-based models
Conduct model profiling and detective work to boost training speed
Collaborate with Perception team on safety-critical self-driving features
Develop custom ops using CUDA or Triton for specialized acceleration
Scale distributed training infrastructure for production autonomy workloads

Benefits

general: Competitive salary range of $228,000 - $285,000 for California applicants
general: Robust medical, Rx, dental, and vision insurance packages
general: Coverage for full-time employees, spouses/domestic partners, and children up to age 26
general: Benefits effective on the first day of employment
general: Equal opportunity employment with comprehensive accommodations
general: Work on cutting-edge autonomous driving technology
general: Collaborative team environment with diverse backgrounds
general: Mission-driven culture focused on protecting the outdoors
general: Access to state-of-the-art GPU clusters and edge computing systems

Target Your Resume for "Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!" , Rivian

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now!" , Rivian

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Staff Software Engineer, ML Training and Inference Infrastructure Careers at Rivian - Palo Alto, California | Apply Now! @ Rivian.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap