Resume and JobRESUME AND JOB
Tencent logo

元宝-大模型训练工程师

Tencent

Software and Technology Jobs

元宝-大模型训练工程师

full-timePosted: Nov 5, 2025

Job Description

元宝-大模型训练工程师

📋 Job Overview

The Yuanbao Large Model Training Engineer role at Tencent involves designing and optimizing LLM training frameworks to support high-performance training of large-scale models. The position requires close collaboration with algorithm and platform teams to ensure stable, high-performance, and scalable LLM infrastructure. Key focuses include system design, performance tuning, stability assurance, and driving AI large model innovations for business applications.

📍 Location: Beijing, China

🏢 Business Unit: CSIG

📄 Full Description

1.负责设计和优化LLM训练框架,支撑LLM高性能训练。与算法团队、平台团队紧密协作,确保LLM infra的稳定、高性能、可扩展,推动AI大模型技术的落地与创新:;
2.负责设计和优化LLM训练框架,支撑LLM高性能训练。与算法团队、平台团队紧密协作,确保LLM infra的稳定、高性能、可扩展;
3.系统设计与优化:设计并搭建分布式训练框架,跟平台协作,支持千亿级参数大模型的训练;
4.性能调优与成本优化:针对大模型训练任务,优化框架(如PyTorch/VERL等)的分布式策略,提升训练效率;
5.稳定性与可靠性保障:设计高可用架构,解决训练中断、数据丢失等风险,确保长周期训练任务的稳定性;
6.协作与落地:与算法团队紧密合作,理解模型需求,提供基础设施层面的技术建议;推动开源工具的定制化开发,适配业务场景。

🎯 Key Responsibilities

  • Design and optimize LLM training frameworks to support high-performance training, collaborating closely with algorithm and platform teams to ensure stable, high-performance, and scalable LLM infrastructure.
  • Design and build distributed training frameworks in collaboration with the platform team, supporting training of billion-parameter large models.
  • Perform performance tuning and cost optimization for large model training tasks, optimizing distributed strategies in frameworks like PyTorch/VERL to improve training efficiency.
  • Ensure stability and reliability by designing high-availability architectures, addressing risks such as training interruptions and data loss for long-cycle training tasks.
  • Collaborate with the algorithm team to understand model requirements, provide technical advice on infrastructure, and drive customized development of open-source tools to fit business scenarios.

🛠️ Required Skills

  • Expertise in designing and optimizing distributed training frameworks for LLMs.
  • Proficiency in performance tuning and optimization of frameworks like PyTorch or VERL.
  • Knowledge of high-availability architectures and stability measures for large-scale training.
  • Strong collaboration skills with algorithm and platform teams.
  • Experience in pushing AI large model technologies for practical applications and innovations.

Locations

  • Beijing, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Expertise in designing and optimizing distributed training frameworks for LLMs.intermediate
  • Proficiency in performance tuning and optimization of frameworks like PyTorch or VERL.intermediate
  • Knowledge of high-availability architectures and stability measures for large-scale training.intermediate
  • Strong collaboration skills with algorithm and platform teams.intermediate
  • Experience in pushing AI large model technologies for practical applications and innovations.intermediate

Responsibilities

  • Design and optimize LLM training frameworks to support high-performance training, collaborating closely with algorithm and platform teams to ensure stable, high-performance, and scalable LLM infrastructure.
  • Design and build distributed training frameworks in collaboration with the platform team, supporting training of billion-parameter large models.
  • Perform performance tuning and cost optimization for large model training tasks, optimizing distributed strategies in frameworks like PyTorch/VERL to improve training efficiency.
  • Ensure stability and reliability by designing high-availability architectures, addressing risks such as training interruptions and data loss for long-cycle training tasks.
  • Collaborate with the algorithm team to understand model requirements, provide technical advice on infrastructure, and drive customized development of open-source tools to fit business scenarios.

Target Your Resume for "元宝-大模型训练工程师" , Tencent

Get personalized recommendations to optimize your resume specifically for 元宝-大模型训练工程师. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "元宝-大模型训练工程师" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentBeijingChinaCSIGCSIG

Answer 10 quick questions to check your fit for 元宝-大模型训练工程师 @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Tencent logo

元宝-大模型训练工程师

Tencent

Software and Technology Jobs

元宝-大模型训练工程师

full-timePosted: Nov 5, 2025

Job Description

元宝-大模型训练工程师

📋 Job Overview

The Yuanbao Large Model Training Engineer role at Tencent involves designing and optimizing LLM training frameworks to support high-performance training of large-scale models. The position requires close collaboration with algorithm and platform teams to ensure stable, high-performance, and scalable LLM infrastructure. Key focuses include system design, performance tuning, stability assurance, and driving AI large model innovations for business applications.

📍 Location: Beijing, China

🏢 Business Unit: CSIG

📄 Full Description

1.负责设计和优化LLM训练框架,支撑LLM高性能训练。与算法团队、平台团队紧密协作,确保LLM infra的稳定、高性能、可扩展,推动AI大模型技术的落地与创新:;
2.负责设计和优化LLM训练框架,支撑LLM高性能训练。与算法团队、平台团队紧密协作,确保LLM infra的稳定、高性能、可扩展;
3.系统设计与优化:设计并搭建分布式训练框架,跟平台协作,支持千亿级参数大模型的训练;
4.性能调优与成本优化:针对大模型训练任务,优化框架(如PyTorch/VERL等)的分布式策略,提升训练效率;
5.稳定性与可靠性保障:设计高可用架构,解决训练中断、数据丢失等风险,确保长周期训练任务的稳定性;
6.协作与落地:与算法团队紧密合作,理解模型需求,提供基础设施层面的技术建议;推动开源工具的定制化开发,适配业务场景。

🎯 Key Responsibilities

  • Design and optimize LLM training frameworks to support high-performance training, collaborating closely with algorithm and platform teams to ensure stable, high-performance, and scalable LLM infrastructure.
  • Design and build distributed training frameworks in collaboration with the platform team, supporting training of billion-parameter large models.
  • Perform performance tuning and cost optimization for large model training tasks, optimizing distributed strategies in frameworks like PyTorch/VERL to improve training efficiency.
  • Ensure stability and reliability by designing high-availability architectures, addressing risks such as training interruptions and data loss for long-cycle training tasks.
  • Collaborate with the algorithm team to understand model requirements, provide technical advice on infrastructure, and drive customized development of open-source tools to fit business scenarios.

🛠️ Required Skills

  • Expertise in designing and optimizing distributed training frameworks for LLMs.
  • Proficiency in performance tuning and optimization of frameworks like PyTorch or VERL.
  • Knowledge of high-availability architectures and stability measures for large-scale training.
  • Strong collaboration skills with algorithm and platform teams.
  • Experience in pushing AI large model technologies for practical applications and innovations.

Locations

  • Beijing, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Expertise in designing and optimizing distributed training frameworks for LLMs.intermediate
  • Proficiency in performance tuning and optimization of frameworks like PyTorch or VERL.intermediate
  • Knowledge of high-availability architectures and stability measures for large-scale training.intermediate
  • Strong collaboration skills with algorithm and platform teams.intermediate
  • Experience in pushing AI large model technologies for practical applications and innovations.intermediate

Responsibilities

  • Design and optimize LLM training frameworks to support high-performance training, collaborating closely with algorithm and platform teams to ensure stable, high-performance, and scalable LLM infrastructure.
  • Design and build distributed training frameworks in collaboration with the platform team, supporting training of billion-parameter large models.
  • Perform performance tuning and cost optimization for large model training tasks, optimizing distributed strategies in frameworks like PyTorch/VERL to improve training efficiency.
  • Ensure stability and reliability by designing high-availability architectures, addressing risks such as training interruptions and data loss for long-cycle training tasks.
  • Collaborate with the algorithm team to understand model requirements, provide technical advice on infrastructure, and drive customized development of open-source tools to fit business scenarios.

Target Your Resume for "元宝-大模型训练工程师" , Tencent

Get personalized recommendations to optimize your resume specifically for 元宝-大模型训练工程师. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "元宝-大模型训练工程师" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentBeijingChinaCSIGCSIG

Answer 10 quick questions to check your fit for 元宝-大模型训练工程师 @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.