RESUME AND JOB

AI Infra强化学习工程师

Tencent

AI Infra强化学习工程师

Tencent

full-timePosted: Dec 8, 2025

Job Description

AI Infra强化学习工程师

📋 Job Overview

The AI Infra Reinforcement Learning Engineer at Tencent focuses on designing, developing, and optimizing RL training frameworks for large language models and agentic systems. The role involves building distributed training infrastructures to support scalable RL algorithms and addressing engineering challenges in RL workflows. Responsibilities include tool chain development, collaboration with algorithm teams, and integrating cutting-edge technologies in reinforcement learning and distributed training.

📍 Location: Shanghai, China

🏢 Business Unit: CSIG

📄 Full Description

1.负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化，支撑大规模 RL 算法（如 PPO、DQN、GRPO等）的高效落地；
2.构建分布式训练体系，优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略，提升 GPU 利用率与训练吞吐；
3.设计并实现 RL 训练全流程工具链：包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化（TensorBoard/Weights & Biases）等；
4.解决 RL 训练中的工程瓶颈：如样本传输延迟、GPU 显存溢出、训练稳定性（梯度爆炸 / 消失）等问题，提供工程化解决方案；
5.与 RL 算法团队紧密协作，理解算法需求并迭代基础设施，适配多场景的训练需求；
6.跟进强化学习与分布式训练领域的前沿技术（如 VERL、rllm、Agentlightning、Ray、Megatron-LM等），并落地到实际系统中。

🎯 Key Responsibilities

负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化，支撑大规模 RL 算法（如 PPO、DQN、GRPO等）的高效落地
构建分布式训练体系，优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略，提升 GPU 利用率与训练吞吐
设计并实现 RL 训练全流程工具链：包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化（TensorBoard/Weights & Biases）等
解决 RL 训练中的工程瓶颈：如样本传输延迟、GPU 显存溢出、训练稳定性（梯度爆炸 / 消失）等问题，提供工程化解决方案
与 RL 算法团队紧密协作，理解算法需求并迭代基础设施，适配多场景的训练需求
跟进强化学习与分布式训练领域的前沿技术（如 VERL、rllm、Agentlightning、Ray、Megatron-LM等），并落地到实际系统中

🛠️ Required Skills

Proficiency in RL algorithms such as PPO, DQN, GRPO
Experience with distributed training systems including data parallelism, model parallelism, and Replay Buffer management
Skills in tool chain development for RL workflows: environment encapsulation, data preprocessing, model version management, training log monitoring, and visualization with TensorBoard or Weights & Biases
Ability to solve engineering issues like sample transmission delays, GPU memory overflow, and training stability (gradient explosion/vanishing)
Collaboration with algorithm teams to iterate infrastructure
Knowledge of frontier technologies in RL and distributed training such as VERL, rllm, Agentlightning, Ray, Megatron-LM

Locations

Shanghai, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Proficiency in RL algorithms such as PPO, DQN, GRPOintermediate
Experience with distributed training systems including data parallelism, model parallelism, and Replay Buffer managementintermediate
Skills in tool chain development for RL workflows: environment encapsulation, data preprocessing, model version management, training log monitoring, and visualization with TensorBoard or Weights & Biasesintermediate
Ability to solve engineering issues like sample transmission delays, GPU memory overflow, and training stability (gradient explosion/vanishing)intermediate
Collaboration with algorithm teams to iterate infrastructureintermediate
Knowledge of frontier technologies in RL and distributed training such as VERL, rllm, Agentlightning, Ray, Megatron-LMintermediate

Responsibilities

负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化，支撑大规模 RL 算法（如 PPO、DQN、GRPO等）的高效落地
构建分布式训练体系，优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略，提升 GPU 利用率与训练吞吐
设计并实现 RL 训练全流程工具链：包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化（TensorBoard/Weights & Biases）等
解决 RL 训练中的工程瓶颈：如样本传输延迟、GPU 显存溢出、训练稳定性（梯度爆炸 / 消失）等问题，提供工程化解决方案
与 RL 算法团队紧密协作，理解算法需求并迭代基础设施，适配多场景的训练需求
跟进强化学习与分布式训练领域的前沿技术（如 VERL、rllm、Agentlightning、Ray、Megatron-LM等），并落地到实际系统中

Target Your Resume for "AI Infra强化学习工程师" , Tencent

Get personalized recommendations to optimize your resume specifically for AI Infra强化学习工程师. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "AI Infra强化学习工程师" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

TencentShanghaiChinaCSIGCSIG

Answer 10 quick questions to check your fit for AI Infra强化学习工程师 @ Tencent.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

AI Infra强化学习工程师

Tencent

AI Infra强化学习工程师

Tencent

full-timePosted: Dec 8, 2025

Job Description

AI Infra强化学习工程师

📋 Job Overview

📍 Location: Shanghai, China

🏢 Business Unit: CSIG

📄 Full Description

🎯 Key Responsibilities

负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化，支撑大规模 RL 算法（如 PPO、DQN、GRPO等）的高效落地
构建分布式训练体系，优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略，提升 GPU 利用率与训练吞吐
设计并实现 RL 训练全流程工具链：包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化（TensorBoard/Weights & Biases）等
解决 RL 训练中的工程瓶颈：如样本传输延迟、GPU 显存溢出、训练稳定性（梯度爆炸 / 消失）等问题，提供工程化解决方案
与 RL 算法团队紧密协作，理解算法需求并迭代基础设施，适配多场景的训练需求
跟进强化学习与分布式训练领域的前沿技术（如 VERL、rllm、Agentlightning、Ray、Megatron-LM等），并落地到实际系统中

🛠️ Required Skills

Proficiency in RL algorithms such as PPO, DQN, GRPO
Experience with distributed training systems including data parallelism, model parallelism, and Replay Buffer management
Skills in tool chain development for RL workflows: environment encapsulation, data preprocessing, model version management, training log monitoring, and visualization with TensorBoard or Weights & Biases
Ability to solve engineering issues like sample transmission delays, GPU memory overflow, and training stability (gradient explosion/vanishing)
Collaboration with algorithm teams to iterate infrastructure
Knowledge of frontier technologies in RL and distributed training such as VERL, rllm, Agentlightning, Ray, Megatron-LM

Locations

Shanghai, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Proficiency in RL algorithms such as PPO, DQN, GRPOintermediate
Experience with distributed training systems including data parallelism, model parallelism, and Replay Buffer managementintermediate
Skills in tool chain development for RL workflows: environment encapsulation, data preprocessing, model version management, training log monitoring, and visualization with TensorBoard or Weights & Biasesintermediate
Ability to solve engineering issues like sample transmission delays, GPU memory overflow, and training stability (gradient explosion/vanishing)intermediate
Collaboration with algorithm teams to iterate infrastructureintermediate
Knowledge of frontier technologies in RL and distributed training such as VERL, rllm, Agentlightning, Ray, Megatron-LMintermediate

Responsibilities

负责LLM RL、Agentic RL强化学习训练框架的设计、开发与性能优化，支撑大规模 RL 算法（如 PPO、DQN、GRPO等）的高效落地
构建分布式训练体系，优化训推异步、partial rollout、数据并行、模型并行、Replay Buffer分布式存储与调度策略，提升 GPU 利用率与训练吞吐
设计并实现 RL 训练全流程工具链：包括环境封装、数据预处理、模型版本管理、训练日志监控、指标可可视化（TensorBoard/Weights & Biases）等
解决 RL 训练中的工程瓶颈：如样本传输延迟、GPU 显存溢出、训练稳定性（梯度爆炸 / 消失）等问题，提供工程化解决方案
与 RL 算法团队紧密协作，理解算法需求并迭代基础设施，适配多场景的训练需求
跟进强化学习与分布式训练领域的前沿技术（如 VERL、rllm、Agentlightning、Ray、Megatron-LM等），并落地到实际系统中

Target Your Resume for "AI Infra强化学习工程师" , Tencent

Get personalized recommendations to optimize your resume specifically for AI Infra强化学习工程师. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "AI Infra强化学习工程师" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

TencentShanghaiChinaCSIGCSIG

Answer 10 quick questions to check your fit for AI Infra强化学习工程师 @ Tencent.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

AI Infra强化学习工程师​

Job Description

AI Infra强化学习工程师​

📋 Job Overview

📄 Full Description

🎯 Key Responsibilities

🛠️ Required Skills

Locations

Salary

Skills Required

Responsibilities

Target Your Resume for "AI Infra强化学习工程师​" , Tencent

Check Your ATS Score for "AI Infra强化学习工程师​" , Tencent

Tags & Categories

Related Books and Jobs

AI Infra强化学习工程师​

Job Description

AI Infra强化学习工程师​

📋 Job Overview

📄 Full Description

🎯 Key Responsibilities

🛠️ Required Skills

Locations

Salary

Skills Required

Responsibilities

Target Your Resume for "AI Infra强化学习工程师​" , Tencent

Check Your ATS Score for "AI Infra强化学习工程师​" , Tencent

Tags & Categories

Related Books and Jobs

AI Infra强化学习工程师

AI Infra强化学习工程师

Target Your Resume for "AI Infra强化学习工程师" , Tencent

Check Your ATS Score for "AI Infra强化学习工程师" , Tencent

AI Infra强化学习工程师

AI Infra强化学习工程师

Target Your Resume for "AI Infra强化学习工程师" , Tencent

Check Your ATS Score for "AI Infra强化学习工程师" , Tencent