Resume and JobRESUME AND JOB
NVIDIA logo

Principal On-Device Model Inference Optimization Engineer

NVIDIA

Software and Technology Jobs

Principal On-Device Model Inference Optimization Engineer

full-timePosted: Oct 15, 2025

Job Description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.We are seeking a highly-skilled Senior On-Device Model Inference Optimization Engineer to join our team and lead efforts in improving the performance and efficiency of AI models enabling the next generation of autonomous vehicles technology at NVIDIA!What you'll be doing:Develop and implement strategies to optimize AI model inference for on-device deployment.Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.Optimize performance-critical components using CUDA and C++.Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.Benchmark inference performance, identify bottlenecks, and implement solutions.Research and apply innovative methods for inference optimization.Adapt models for diverse hardware platforms and operating systems with varying capabilities.Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.Recommend and implement model architecture changes to improve the accuracy-latency balance.What we need to see:MSc or PhD in Computer Science, Engineering, or a related field, or equivalent experience.Over 10 years of confirmed experience specializing in model inference and optimization.15+ overall years of work experience in a relevant areaExpertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.Proven experience in optimizing inference for transformer and convolutional architectures.Strong programming proficiency in CUDA, Python, and C++.In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.Skilled in building and deploying scalable, cloud-based inference systems.Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.Strong collaboration and communication skills for working optimally across multidisciplinary teams.A proactive, diligent mentality with a drive to tackle complex optimization challenges.Ways to stand out from the crowd:Publications or industry experience in optimizing and deploying model inference at scale.Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.Active contributions to open-source projects focused on inference optimization or machine learning frameworks.Experience in designing and deploying inference pipelines for real-time or autonomous systems.

Locations

  • Shanghai, China

Salary

Estimated Salary Rangemedium confidence

60,000,000 - 120,000,000 INR / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • AI model inference optimizationintermediate
  • pruningintermediate
  • quantizationintermediate
  • knowledge distillationintermediate
  • CUDAintermediate
  • C++intermediate
  • benchmarkingintermediate
  • research and apply innovative methodsintermediate
  • adapt models for diverse hardware platformsintermediate
  • create validation toolsintermediate
  • recommend model architecture changesintermediate
  • collaborate with multi-functional teamsintermediate

Target Your Resume for "Principal On-Device Model Inference Optimization Engineer" , NVIDIA

Get personalized recommendations to optimize your resume specifically for Principal On-Device Model Inference Optimization Engineer. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Principal On-Device Model Inference Optimization Engineer" , NVIDIA

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

China

Answer 10 quick questions to check your fit for Principal On-Device Model Inference Optimization Engineer @ NVIDIA.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

NVIDIA logo

Principal On-Device Model Inference Optimization Engineer

NVIDIA

Software and Technology Jobs

Principal On-Device Model Inference Optimization Engineer

full-timePosted: Oct 15, 2025

Job Description

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.We are seeking a highly-skilled Senior On-Device Model Inference Optimization Engineer to join our team and lead efforts in improving the performance and efficiency of AI models enabling the next generation of autonomous vehicles technology at NVIDIA!What you'll be doing:Develop and implement strategies to optimize AI model inference for on-device deployment.Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.Optimize performance-critical components using CUDA and C++.Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.Benchmark inference performance, identify bottlenecks, and implement solutions.Research and apply innovative methods for inference optimization.Adapt models for diverse hardware platforms and operating systems with varying capabilities.Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.Recommend and implement model architecture changes to improve the accuracy-latency balance.What we need to see:MSc or PhD in Computer Science, Engineering, or a related field, or equivalent experience.Over 10 years of confirmed experience specializing in model inference and optimization.15+ overall years of work experience in a relevant areaExpertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.Proven experience in optimizing inference for transformer and convolutional architectures.Strong programming proficiency in CUDA, Python, and C++.In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.Skilled in building and deploying scalable, cloud-based inference systems.Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.Strong collaboration and communication skills for working optimally across multidisciplinary teams.A proactive, diligent mentality with a drive to tackle complex optimization challenges.Ways to stand out from the crowd:Publications or industry experience in optimizing and deploying model inference at scale.Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.Active contributions to open-source projects focused on inference optimization or machine learning frameworks.Experience in designing and deploying inference pipelines for real-time or autonomous systems.

Locations

  • Shanghai, China

Salary

Estimated Salary Rangemedium confidence

60,000,000 - 120,000,000 INR / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • AI model inference optimizationintermediate
  • pruningintermediate
  • quantizationintermediate
  • knowledge distillationintermediate
  • CUDAintermediate
  • C++intermediate
  • benchmarkingintermediate
  • research and apply innovative methodsintermediate
  • adapt models for diverse hardware platformsintermediate
  • create validation toolsintermediate
  • recommend model architecture changesintermediate
  • collaborate with multi-functional teamsintermediate

Target Your Resume for "Principal On-Device Model Inference Optimization Engineer" , NVIDIA

Get personalized recommendations to optimize your resume specifically for Principal On-Device Model Inference Optimization Engineer. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Principal On-Device Model Inference Optimization Engineer" , NVIDIA

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

China

Answer 10 quick questions to check your fit for Principal On-Device Model Inference Optimization Engineer @ NVIDIA.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.