Resume and JobRESUME AND JOB
Tencent logo

大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)

Tencent

大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)

Tencent logo

Tencent

full-time

Posted: December 8, 2025

Number of Vacancies: 1

Job Description

大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)

📋 Job Overview

Tencent is seeking a Large Model Training Performance Optimization Engineer specializing in training operators for locations in Shenzhen, Beijing, Shanghai, and Hangzhou. The role focuses on designing, implementing, and optimizing deep learning training operators using technologies like CUDA, CUTLASS, and Triton. Responsibilities include performance analysis, collaboration on distributed training systems, and tracking cutting-edge hardware and software advancements to enhance training efficiency.

📍 Location: Shenzhen, China

🏢 Business Unit: TEG

📄 Full Description

1.负责深度学习训练相关算子的设计、实现与优化( CUDA/CUTLASS/Triton );
2.面向大模型训练场景,对算子进行端到端性能分析与调优,持续挖掘吞吐、延迟、显存利用率等指标的优化空间;
3.参与或主导 3D 并行(Data / Tensor / Pipeline Parallel 等)训练体系下的算子与通信方案设计与优化;
4.与分布式训练、系统、模型算法团队密切协作,共同提升大规模训练任务的整体效率与稳定性;
5.跟踪业界前沿的硬件架构与系统软件(GPU 架构、网络、编译器、库等),将最新技术转化为实际性能收益。

🎯 Key Responsibilities

  • 负责深度学习训练相关算子的设计、实现与优化( CUDA/CUTLASS/Triton )
  • 面向大模型训练场景,对算子进行端到端性能分析与调优,持续挖掘吞吐、延迟、显存利用率等指标的优化空间
  • 参与或主导 3D 并行(Data / Tensor / Pipeline Parallel 等)训练体系下的算子与通信方案设计与优化
  • 与分布式训练、系统、模型算法团队密切协作,共同提升大规模训练任务的整体效率与稳定性
  • 跟踪业界前沿的硬件架构与系统软件(GPU 架构、网络、编译器、库等),将最新技术转化为实际性能收益

🛠️ Required Skills

  • CUDA
  • CUTLASS
  • Triton
  • 深度学习训练算子设计与优化
  • 性能分析与调优
  • 3D 并行训练(Data / Tensor / Pipeline Parallel)
  • 分布式训练协作
  • GPU 架构与系统软件跟踪

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • CUDAintermediate
  • CUTLASSintermediate
  • Tritonintermediate
  • 深度学习训练算子设计与优化intermediate
  • 性能分析与调优intermediate
  • 3D 并行训练(Data / Tensor / Pipeline Parallel)intermediate
  • 分布式训练协作intermediate
  • GPU 架构与系统软件跟踪intermediate

Responsibilities

  • 负责深度学习训练相关算子的设计、实现与优化( CUDA/CUTLASS/Triton )
  • 面向大模型训练场景,对算子进行端到端性能分析与调优,持续挖掘吞吐、延迟、显存利用率等指标的优化空间
  • 参与或主导 3D 并行(Data / Tensor / Pipeline Parallel 等)训练体系下的算子与通信方案设计与优化
  • 与分布式训练、系统、模型算法团队密切协作,共同提升大规模训练任务的整体效率与稳定性
  • 跟踪业界前沿的硬件架构与系统软件(GPU 架构、网络、编译器、库等),将最新技术转化为实际性能收益

Target Your Resume for "大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)" , Tencent

Get personalized recommendations to optimize your resume specifically for 大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaTEGTEG

Related Jobs You May Like

No related jobs found at the moment.

Tencent logo

大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)

Tencent

大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)

Tencent logo

Tencent

full-time

Posted: December 8, 2025

Number of Vacancies: 1

Job Description

大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)

📋 Job Overview

Tencent is seeking a Large Model Training Performance Optimization Engineer specializing in training operators for locations in Shenzhen, Beijing, Shanghai, and Hangzhou. The role focuses on designing, implementing, and optimizing deep learning training operators using technologies like CUDA, CUTLASS, and Triton. Responsibilities include performance analysis, collaboration on distributed training systems, and tracking cutting-edge hardware and software advancements to enhance training efficiency.

📍 Location: Shenzhen, China

🏢 Business Unit: TEG

📄 Full Description

1.负责深度学习训练相关算子的设计、实现与优化( CUDA/CUTLASS/Triton );
2.面向大模型训练场景,对算子进行端到端性能分析与调优,持续挖掘吞吐、延迟、显存利用率等指标的优化空间;
3.参与或主导 3D 并行(Data / Tensor / Pipeline Parallel 等)训练体系下的算子与通信方案设计与优化;
4.与分布式训练、系统、模型算法团队密切协作,共同提升大规模训练任务的整体效率与稳定性;
5.跟踪业界前沿的硬件架构与系统软件(GPU 架构、网络、编译器、库等),将最新技术转化为实际性能收益。

🎯 Key Responsibilities

  • 负责深度学习训练相关算子的设计、实现与优化( CUDA/CUTLASS/Triton )
  • 面向大模型训练场景,对算子进行端到端性能分析与调优,持续挖掘吞吐、延迟、显存利用率等指标的优化空间
  • 参与或主导 3D 并行(Data / Tensor / Pipeline Parallel 等)训练体系下的算子与通信方案设计与优化
  • 与分布式训练、系统、模型算法团队密切协作,共同提升大规模训练任务的整体效率与稳定性
  • 跟踪业界前沿的硬件架构与系统软件(GPU 架构、网络、编译器、库等),将最新技术转化为实际性能收益

🛠️ Required Skills

  • CUDA
  • CUTLASS
  • Triton
  • 深度学习训练算子设计与优化
  • 性能分析与调优
  • 3D 并行训练(Data / Tensor / Pipeline Parallel)
  • 分布式训练协作
  • GPU 架构与系统软件跟踪

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • CUDAintermediate
  • CUTLASSintermediate
  • Tritonintermediate
  • 深度学习训练算子设计与优化intermediate
  • 性能分析与调优intermediate
  • 3D 并行训练(Data / Tensor / Pipeline Parallel)intermediate
  • 分布式训练协作intermediate
  • GPU 架构与系统软件跟踪intermediate

Responsibilities

  • 负责深度学习训练相关算子的设计、实现与优化( CUDA/CUTLASS/Triton )
  • 面向大模型训练场景,对算子进行端到端性能分析与调优,持续挖掘吞吐、延迟、显存利用率等指标的优化空间
  • 参与或主导 3D 并行(Data / Tensor / Pipeline Parallel 等)训练体系下的算子与通信方案设计与优化
  • 与分布式训练、系统、模型算法团队密切协作,共同提升大规模训练任务的整体效率与稳定性
  • 跟踪业界前沿的硬件架构与系统软件(GPU 架构、网络、编译器、库等),将最新技术转化为实际性能收益

Target Your Resume for "大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)" , Tencent

Get personalized recommendations to optimize your resume specifically for 大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "大模型训练性能优化工程师(训练算子)(深圳/北京/上海/杭州)" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaTEGTEG

Related Jobs You May Like

No related jobs found at the moment.