Resume and JobRESUME AND JOB
Tencent logo

AI Infra强化学习工程师​

Tencent

Software and Technology Jobs

AI Infra强化学习工程师​

full-timePosted: Dec 8, 2025

Job Description

AI Infra强化学习工程师​

📋 Job Overview

The AI Infra Reinforcement Learning Engineer at Tencent focuses on designing, developing, and optimizing RL training frameworks for large language models and agentic systems. The role involves building distributed training infrastructures to support scalable RL algorithms and addressing engineering challenges in RL workflows. Responsibilities include tool chain development, collaboration with algorithm teams, and integrating cutting-edge technologies in reinforcement learning and distributed training.

📍 Location: Shanghai, China

🏢 Business Unit: CSIG

📄 Full Description

1.负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化,支撑大规模 RL 算法(如 PPO、DQN、GRPO等)的高效落地;
2.构建分布式训练体系,优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略,提升 GPU 利用率与训练吞吐;
3.设计并实现 RL 训练全流程工具链:包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化(TensorBoard/Weights & Biases)等;
4.解决 RL 训练中的工程瓶颈:如样本传输延迟、GPU 显存溢出、训练稳定性(梯度爆炸 / 消失)等问题,提供工程化解决方案;
5.与 RL 算法团队紧密协作,理解算法需求并迭代基础设施,适配多场景的训练需求;
6.跟进强化学习与分布式训练领域的前沿技术(如 VERL、rllm、Agentlightning、Ray、Megatron-LM等),并落地到实际系统中。

🎯 Key Responsibilities

  • 负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化,支撑大规模 RL 算法(如 PPO、DQN、GRPO等)的高效落地
  • 构建分布式训练体系,优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略,提升 GPU 利用率与训练吞吐
  • 设计并实现 RL 训练全流程工具链:包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化(TensorBoard/Weights & Biases)等
  • 解决 RL 训练中的工程瓶颈:如样本传输延迟、GPU 显存溢出、训练稳定性(梯度爆炸 / 消失)等问题,提供工程化解决方案
  • 与 RL 算法团队紧密协作,理解算法需求并迭代基础设施,适配多场景的训练需求
  • 跟进强化学习与分布式训练领域的前沿技术(如 VERL、rllm、Agentlightning、Ray、Megatron-LM等),并落地到实际系统中

🛠️ Required Skills

  • Proficiency in RL algorithms such as PPO, DQN, GRPO
  • Experience with distributed training systems including data parallelism, model parallelism, and Replay Buffer management
  • Skills in tool chain development for RL workflows: environment encapsulation, data preprocessing, model version management, training log monitoring, and visualization with TensorBoard or Weights & Biases
  • Ability to solve engineering issues like sample transmission delays, GPU memory overflow, and training stability (gradient explosion/vanishing)
  • Collaboration with algorithm teams to iterate infrastructure
  • Knowledge of frontier technologies in RL and distributed training such as VERL, rllm, Agentlightning, Ray, Megatron-LM

Locations

  • Shanghai, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Proficiency in RL algorithms such as PPO, DQN, GRPOintermediate
  • Experience with distributed training systems including data parallelism, model parallelism, and Replay Buffer managementintermediate
  • Skills in tool chain development for RL workflows: environment encapsulation, data preprocessing, model version management, training log monitoring, and visualization with TensorBoard or Weights & Biasesintermediate
  • Ability to solve engineering issues like sample transmission delays, GPU memory overflow, and training stability (gradient explosion/vanishing)intermediate
  • Collaboration with algorithm teams to iterate infrastructureintermediate
  • Knowledge of frontier technologies in RL and distributed training such as VERL, rllm, Agentlightning, Ray, Megatron-LMintermediate

Responsibilities

  • 负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化,支撑大规模 RL 算法(如 PPO、DQN、GRPO等)的高效落地
  • 构建分布式训练体系,优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略,提升 GPU 利用率与训练吞吐
  • 设计并实现 RL 训练全流程工具链:包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化(TensorBoard/Weights & Biases)等
  • 解决 RL 训练中的工程瓶颈:如样本传输延迟、GPU 显存溢出、训练稳定性(梯度爆炸 / 消失)等问题,提供工程化解决方案
  • 与 RL 算法团队紧密协作,理解算法需求并迭代基础设施,适配多场景的训练需求
  • 跟进强化学习与分布式训练领域的前沿技术(如 VERL、rllm、Agentlightning、Ray、Megatron-LM等),并落地到实际系统中

Target Your Resume for "AI Infra强化学习工程师​" , Tencent

Get personalized recommendations to optimize your resume specifically for AI Infra强化学习工程师​. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "AI Infra强化学习工程师​" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShanghaiChinaCSIGCSIG

Answer 10 quick questions to check your fit for AI Infra强化学习工程师​ @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Tencent logo

AI Infra强化学习工程师​

Tencent

Software and Technology Jobs

AI Infra强化学习工程师​

full-timePosted: Dec 8, 2025

Job Description

AI Infra强化学习工程师​

📋 Job Overview

The AI Infra Reinforcement Learning Engineer at Tencent focuses on designing, developing, and optimizing RL training frameworks for large language models and agentic systems. The role involves building distributed training infrastructures to support scalable RL algorithms and addressing engineering challenges in RL workflows. Responsibilities include tool chain development, collaboration with algorithm teams, and integrating cutting-edge technologies in reinforcement learning and distributed training.

📍 Location: Shanghai, China

🏢 Business Unit: CSIG

📄 Full Description

1.负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化,支撑大规模 RL 算法(如 PPO、DQN、GRPO等)的高效落地;
2.构建分布式训练体系,优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略,提升 GPU 利用率与训练吞吐;
3.设计并实现 RL 训练全流程工具链:包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化(TensorBoard/Weights & Biases)等;
4.解决 RL 训练中的工程瓶颈:如样本传输延迟、GPU 显存溢出、训练稳定性(梯度爆炸 / 消失)等问题,提供工程化解决方案;
5.与 RL 算法团队紧密协作,理解算法需求并迭代基础设施,适配多场景的训练需求;
6.跟进强化学习与分布式训练领域的前沿技术(如 VERL、rllm、Agentlightning、Ray、Megatron-LM等),并落地到实际系统中。

🎯 Key Responsibilities

  • 负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化,支撑大规模 RL 算法(如 PPO、DQN、GRPO等)的高效落地
  • 构建分布式训练体系,优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略,提升 GPU 利用率与训练吞吐
  • 设计并实现 RL 训练全流程工具链:包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化(TensorBoard/Weights & Biases)等
  • 解决 RL 训练中的工程瓶颈:如样本传输延迟、GPU 显存溢出、训练稳定性(梯度爆炸 / 消失)等问题,提供工程化解决方案
  • 与 RL 算法团队紧密协作,理解算法需求并迭代基础设施,适配多场景的训练需求
  • 跟进强化学习与分布式训练领域的前沿技术(如 VERL、rllm、Agentlightning、Ray、Megatron-LM等),并落地到实际系统中

🛠️ Required Skills

  • Proficiency in RL algorithms such as PPO, DQN, GRPO
  • Experience with distributed training systems including data parallelism, model parallelism, and Replay Buffer management
  • Skills in tool chain development for RL workflows: environment encapsulation, data preprocessing, model version management, training log monitoring, and visualization with TensorBoard or Weights & Biases
  • Ability to solve engineering issues like sample transmission delays, GPU memory overflow, and training stability (gradient explosion/vanishing)
  • Collaboration with algorithm teams to iterate infrastructure
  • Knowledge of frontier technologies in RL and distributed training such as VERL, rllm, Agentlightning, Ray, Megatron-LM

Locations

  • Shanghai, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Proficiency in RL algorithms such as PPO, DQN, GRPOintermediate
  • Experience with distributed training systems including data parallelism, model parallelism, and Replay Buffer managementintermediate
  • Skills in tool chain development for RL workflows: environment encapsulation, data preprocessing, model version management, training log monitoring, and visualization with TensorBoard or Weights & Biasesintermediate
  • Ability to solve engineering issues like sample transmission delays, GPU memory overflow, and training stability (gradient explosion/vanishing)intermediate
  • Collaboration with algorithm teams to iterate infrastructureintermediate
  • Knowledge of frontier technologies in RL and distributed training such as VERL, rllm, Agentlightning, Ray, Megatron-LMintermediate

Responsibilities

  • 负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化,支撑大规模 RL 算法(如 PPO、DQN、GRPO等)的高效落地
  • 构建分布式训练体系,优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略,提升 GPU 利用率与训练吞吐
  • 设计并实现 RL 训练全流程工具链:包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化(TensorBoard/Weights & Biases)等
  • 解决 RL 训练中的工程瓶颈:如样本传输延迟、GPU 显存溢出、训练稳定性(梯度爆炸 / 消失)等问题,提供工程化解决方案
  • 与 RL 算法团队紧密协作,理解算法需求并迭代基础设施,适配多场景的训练需求
  • 跟进强化学习与分布式训练领域的前沿技术(如 VERL、rllm、Agentlightning、Ray、Megatron-LM等),并落地到实际系统中

Target Your Resume for "AI Infra强化学习工程师​" , Tencent

Get personalized recommendations to optimize your resume specifically for AI Infra强化学习工程师​. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "AI Infra强化学习工程师​" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShanghaiChinaCSIGCSIG

Answer 10 quick questions to check your fit for AI Infra强化学习工程师​ @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.