Resume and JobRESUME AND JOB
NVIDIA logo

Deep Learning Performance Architect - Perf Tools

NVIDIA

Engineering Jobs

Deep Learning Performance Architect - Perf Tools

full-timePosted: Sep 18, 2025

Job Description

We are looking for a first-class Deep Learning Performance architect to join us to shape the performance analysis infrastructures for GPUs. We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPC workloads—spanning pre-silicon architectural exploration to post-silicon validation and optimization. Your work will directly shape the tools that define how NVIDIA GPUs are analyzed, tuned, and scaled for next-gen AI systems, and impact the next-gen GPUs architectures. What you'll be doing: Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle. Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture What we need to see: BS+ in Computer Science, Electronic Engineering or related (or equivalent experience)4+ years of software development Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals Experience with performance modeling, architecture simulation, profiling, and analysis. Self-starter who thrives in dynamic environments and manages competing priorities effectively. Ways to stand out from the crowd: Experience with building performance debugging and analysis tools on silicon and simulators. Experience of developing application snapshot and replay tool is a big plus.Familiar with CUDA System Software Stack(e.g., CUDA Driver/Runtime APIs), CUDA kernel optimization and understand GPU architecture Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute, NVTX, etc, or experience for developing similar tools for other processors. Practical experience or projects demonstrating AI/ML-based code generation, automated data analysis, or workflow assistants.

Locations

  • Shanghai, China

Salary

Estimated Salary Rangemedium confidence

30,000,000 - 60,000,000 INR / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • C++intermediate
  • Pythonintermediate
  • GPU performance analysisintermediate
  • visualization frameworksintermediate
  • performance modelingintermediate
  • architecture simulationintermediate
  • profilingintermediate
  • analysisintermediate
  • computer architectureintermediate
  • pipelinesintermediate
  • memory hierarchiesintermediate
  • operating system fundamentalsintermediate
  • CUDAintermediate
  • AI/MLintermediate
  • kernel developmentintermediate
  • system softwareintermediate
  • hardware architectureintermediate
  • debuggingintermediate
  • low-level programmingintermediate
  • analytical skillsintermediate
  • software designintermediate
  • codingintermediate
  • self-starterintermediate
  • manages competing prioritiesintermediate

Target Your Resume for "Deep Learning Performance Architect - Perf Tools" , NVIDIA

Get personalized recommendations to optimize your resume specifically for Deep Learning Performance Architect - Perf Tools. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Deep Learning Performance Architect - Perf Tools" , NVIDIA

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

China

Answer 10 quick questions to check your fit for Deep Learning Performance Architect - Perf Tools @ NVIDIA.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

NVIDIA logo

Deep Learning Performance Architect - Perf Tools

NVIDIA

Engineering Jobs

Deep Learning Performance Architect - Perf Tools

full-timePosted: Sep 18, 2025

Job Description

We are looking for a first-class Deep Learning Performance architect to join us to shape the performance analysis infrastructures for GPUs. We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPC workloads—spanning pre-silicon architectural exploration to post-silicon validation and optimization. Your work will directly shape the tools that define how NVIDIA GPUs are analyzed, tuned, and scaled for next-gen AI systems, and impact the next-gen GPUs architectures. What you'll be doing: Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle. Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities. AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure. Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture What we need to see: BS+ in Computer Science, Electronic Engineering or related (or equivalent experience)4+ years of software development Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals Experience with performance modeling, architecture simulation, profiling, and analysis. Self-starter who thrives in dynamic environments and manages competing priorities effectively. Ways to stand out from the crowd: Experience with building performance debugging and analysis tools on silicon and simulators. Experience of developing application snapshot and replay tool is a big plus.Familiar with CUDA System Software Stack(e.g., CUDA Driver/Runtime APIs), CUDA kernel optimization and understand GPU architecture Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute, NVTX, etc, or experience for developing similar tools for other processors. Practical experience or projects demonstrating AI/ML-based code generation, automated data analysis, or workflow assistants.

Locations

  • Shanghai, China

Salary

Estimated Salary Rangemedium confidence

30,000,000 - 60,000,000 INR / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • C++intermediate
  • Pythonintermediate
  • GPU performance analysisintermediate
  • visualization frameworksintermediate
  • performance modelingintermediate
  • architecture simulationintermediate
  • profilingintermediate
  • analysisintermediate
  • computer architectureintermediate
  • pipelinesintermediate
  • memory hierarchiesintermediate
  • operating system fundamentalsintermediate
  • CUDAintermediate
  • AI/MLintermediate
  • kernel developmentintermediate
  • system softwareintermediate
  • hardware architectureintermediate
  • debuggingintermediate
  • low-level programmingintermediate
  • analytical skillsintermediate
  • software designintermediate
  • codingintermediate
  • self-starterintermediate
  • manages competing prioritiesintermediate

Target Your Resume for "Deep Learning Performance Architect - Perf Tools" , NVIDIA

Get personalized recommendations to optimize your resume specifically for Deep Learning Performance Architect - Perf Tools. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Deep Learning Performance Architect - Perf Tools" , NVIDIA

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

China

Answer 10 quick questions to check your fit for Deep Learning Performance Architect - Perf Tools @ NVIDIA.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.