Resume and JobRESUME AND JOB
Tencent logo

腾讯广告-算法工程师-强化学习方向

Tencent

Software and Technology Jobs

腾讯广告-算法工程师-强化学习方向

full-timePosted: Dec 1, 2025

Job Description

腾讯广告-算法工程师-强化学习方向

📋 Job Overview

The Algorithm Engineer position in Tencent Advertising focuses on developing and optimizing multi-objective reinforcement learning algorithms for advertising scenarios. Responsibilities include building improved frameworks for algorithms like DQN, PPO, and SAC, analyzing bottlenecks with interpretability tools, and innovating state and reward mechanisms. The role also involves tracking advancements in deep learning, computational advertising, recommendation systems, and applying them to multi-objective ranking.

📍 Location: Shenzhen, China

🏢 Business Unit: CDG

📄 Full Description

1.多目标强化学习算法开发与调优。基于业务场景构建DQN、PPO、SAC等算法的改进框架,针对延迟奖励稀疏性设计分层强化学习架构。搭建离线仿真环境与在线AB测试闭环,设计动态滑动窗口评估机制,量化算法迭代效果;
2.效果瓶颈分析与突破。构建强化学习可解释性分析工具(如SHAP值、注意力热力图),定位状态表征缺失/奖励函数偏差/探索不足等瓶颈。设计课程学习机制,通过渐进式难度提升策略解决稀疏奖励场景下的策略退化问题;
3.状态与奖励机制创新。构建异构特征融合模型,集成用户实时行为序列(LSTM)、跨场景偏好迁移(Meta Learning)等高阶状态表征。设计复合奖励函数,融合稠密奖励(点击行为)与稀疏奖励(购买行为),引入基于KL散度的奖励塑形技术;
4.跟踪深度学习、计算广告、推荐系统,deepseek等最新前沿技术,应用到多目标排序。

🎯 Key Responsibilities

  • Develop and optimize multi-objective reinforcement learning algorithms, building improved frameworks for DQN, PPO, SAC, and designing hierarchical architectures for sparse delayed rewards
  • Build offline simulation environments, online A/B testing loops, and dynamic sliding window evaluation mechanisms to quantify algorithm iterations
  • Analyze and break through effect bottlenecks using interpretability tools like SHAP values and attention heatmaps, addressing issues like state representation gaps, reward biases, and exploration deficiencies
  • Design curriculum learning mechanisms with progressive difficulty strategies to solve policy degradation in sparse reward scenarios
  • Innovate state and reward mechanisms by building heterogeneous feature fusion models integrating LSTM for user behavior sequences and meta-learning for preferences
  • Design composite reward functions combining dense (clicks) and sparse (purchases) rewards, incorporating KL-divergence-based reward shaping
  • Track and apply latest advancements in deep learning, computational advertising, recommendation systems, and technologies like DeepSeek to multi-objective ranking

✅ Required Qualifications

  • Experience in reinforcement learning algorithm development
  • Knowledge of DQN, PPO, SAC algorithms
  • Ability to build offline simulation environments and online A/B testing loops
  • Skills in analyzing and breaking through effect bottlenecks in RL

⭐ Preferred Qualifications

  • Expertise in designing hierarchical RL architectures for sparse delayed rewards
  • Familiarity with dynamic sliding window evaluation mechanisms
  • Experience with curriculum learning for sparse reward scenarios
  • Knowledge of meta-learning for cross-scenario preference transfer

🛠️ Required Skills

  • Reinforcement learning (DQN, PPO, SAC, hierarchical RL)
  • Deep learning and neural networks (LSTM, meta-learning)
  • Interpretability tools (SHAP, attention heatmaps)
  • A/B testing and simulation environments
  • Reward shaping and curriculum learning
  • Feature fusion and state representation
  • Knowledge of computational advertising and recommendation systems

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Reinforcement learning (DQN, PPO, SAC, hierarchical RL)intermediate
  • Deep learning and neural networks (LSTM, meta-learning)intermediate
  • Interpretability tools (SHAP, attention heatmaps)intermediate
  • A/B testing and simulation environmentsintermediate
  • Reward shaping and curriculum learningintermediate
  • Feature fusion and state representationintermediate
  • Knowledge of computational advertising and recommendation systemsintermediate

Required Qualifications

  • Experience in reinforcement learning algorithm development (experience)
  • Knowledge of DQN, PPO, SAC algorithms (experience)
  • Ability to build offline simulation environments and online A/B testing loops (experience)
  • Skills in analyzing and breaking through effect bottlenecks in RL (experience)

Preferred Qualifications

  • Expertise in designing hierarchical RL architectures for sparse delayed rewards (experience)
  • Familiarity with dynamic sliding window evaluation mechanisms (experience)
  • Experience with curriculum learning for sparse reward scenarios (experience)
  • Knowledge of meta-learning for cross-scenario preference transfer (experience)

Responsibilities

  • Develop and optimize multi-objective reinforcement learning algorithms, building improved frameworks for DQN, PPO, SAC, and designing hierarchical architectures for sparse delayed rewards
  • Build offline simulation environments, online A/B testing loops, and dynamic sliding window evaluation mechanisms to quantify algorithm iterations
  • Analyze and break through effect bottlenecks using interpretability tools like SHAP values and attention heatmaps, addressing issues like state representation gaps, reward biases, and exploration deficiencies
  • Design curriculum learning mechanisms with progressive difficulty strategies to solve policy degradation in sparse reward scenarios
  • Innovate state and reward mechanisms by building heterogeneous feature fusion models integrating LSTM for user behavior sequences and meta-learning for preferences
  • Design composite reward functions combining dense (clicks) and sparse (purchases) rewards, incorporating KL-divergence-based reward shaping
  • Track and apply latest advancements in deep learning, computational advertising, recommendation systems, and technologies like DeepSeek to multi-objective ranking

Target Your Resume for "腾讯广告-算法工程师-强化学习方向" , Tencent

Get personalized recommendations to optimize your resume specifically for 腾讯广告-算法工程师-强化学习方向. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "腾讯广告-算法工程师-强化学习方向" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaCDGCDG

Answer 10 quick questions to check your fit for 腾讯广告-算法工程师-强化学习方向 @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Tencent logo

腾讯广告-算法工程师-强化学习方向

Tencent

Software and Technology Jobs

腾讯广告-算法工程师-强化学习方向

full-timePosted: Dec 1, 2025

Job Description

腾讯广告-算法工程师-强化学习方向

📋 Job Overview

The Algorithm Engineer position in Tencent Advertising focuses on developing and optimizing multi-objective reinforcement learning algorithms for advertising scenarios. Responsibilities include building improved frameworks for algorithms like DQN, PPO, and SAC, analyzing bottlenecks with interpretability tools, and innovating state and reward mechanisms. The role also involves tracking advancements in deep learning, computational advertising, recommendation systems, and applying them to multi-objective ranking.

📍 Location: Shenzhen, China

🏢 Business Unit: CDG

📄 Full Description

1.多目标强化学习算法开发与调优。基于业务场景构建DQN、PPO、SAC等算法的改进框架,针对延迟奖励稀疏性设计分层强化学习架构。搭建离线仿真环境与在线AB测试闭环,设计动态滑动窗口评估机制,量化算法迭代效果;
2.效果瓶颈分析与突破。构建强化学习可解释性分析工具(如SHAP值、注意力热力图),定位状态表征缺失/奖励函数偏差/探索不足等瓶颈。设计课程学习机制,通过渐进式难度提升策略解决稀疏奖励场景下的策略退化问题;
3.状态与奖励机制创新。构建异构特征融合模型,集成用户实时行为序列(LSTM)、跨场景偏好迁移(Meta Learning)等高阶状态表征。设计复合奖励函数,融合稠密奖励(点击行为)与稀疏奖励(购买行为),引入基于KL散度的奖励塑形技术;
4.跟踪深度学习、计算广告、推荐系统,deepseek等最新前沿技术,应用到多目标排序。

🎯 Key Responsibilities

  • Develop and optimize multi-objective reinforcement learning algorithms, building improved frameworks for DQN, PPO, SAC, and designing hierarchical architectures for sparse delayed rewards
  • Build offline simulation environments, online A/B testing loops, and dynamic sliding window evaluation mechanisms to quantify algorithm iterations
  • Analyze and break through effect bottlenecks using interpretability tools like SHAP values and attention heatmaps, addressing issues like state representation gaps, reward biases, and exploration deficiencies
  • Design curriculum learning mechanisms with progressive difficulty strategies to solve policy degradation in sparse reward scenarios
  • Innovate state and reward mechanisms by building heterogeneous feature fusion models integrating LSTM for user behavior sequences and meta-learning for preferences
  • Design composite reward functions combining dense (clicks) and sparse (purchases) rewards, incorporating KL-divergence-based reward shaping
  • Track and apply latest advancements in deep learning, computational advertising, recommendation systems, and technologies like DeepSeek to multi-objective ranking

✅ Required Qualifications

  • Experience in reinforcement learning algorithm development
  • Knowledge of DQN, PPO, SAC algorithms
  • Ability to build offline simulation environments and online A/B testing loops
  • Skills in analyzing and breaking through effect bottlenecks in RL

⭐ Preferred Qualifications

  • Expertise in designing hierarchical RL architectures for sparse delayed rewards
  • Familiarity with dynamic sliding window evaluation mechanisms
  • Experience with curriculum learning for sparse reward scenarios
  • Knowledge of meta-learning for cross-scenario preference transfer

🛠️ Required Skills

  • Reinforcement learning (DQN, PPO, SAC, hierarchical RL)
  • Deep learning and neural networks (LSTM, meta-learning)
  • Interpretability tools (SHAP, attention heatmaps)
  • A/B testing and simulation environments
  • Reward shaping and curriculum learning
  • Feature fusion and state representation
  • Knowledge of computational advertising and recommendation systems

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Reinforcement learning (DQN, PPO, SAC, hierarchical RL)intermediate
  • Deep learning and neural networks (LSTM, meta-learning)intermediate
  • Interpretability tools (SHAP, attention heatmaps)intermediate
  • A/B testing and simulation environmentsintermediate
  • Reward shaping and curriculum learningintermediate
  • Feature fusion and state representationintermediate
  • Knowledge of computational advertising and recommendation systemsintermediate

Required Qualifications

  • Experience in reinforcement learning algorithm development (experience)
  • Knowledge of DQN, PPO, SAC algorithms (experience)
  • Ability to build offline simulation environments and online A/B testing loops (experience)
  • Skills in analyzing and breaking through effect bottlenecks in RL (experience)

Preferred Qualifications

  • Expertise in designing hierarchical RL architectures for sparse delayed rewards (experience)
  • Familiarity with dynamic sliding window evaluation mechanisms (experience)
  • Experience with curriculum learning for sparse reward scenarios (experience)
  • Knowledge of meta-learning for cross-scenario preference transfer (experience)

Responsibilities

  • Develop and optimize multi-objective reinforcement learning algorithms, building improved frameworks for DQN, PPO, SAC, and designing hierarchical architectures for sparse delayed rewards
  • Build offline simulation environments, online A/B testing loops, and dynamic sliding window evaluation mechanisms to quantify algorithm iterations
  • Analyze and break through effect bottlenecks using interpretability tools like SHAP values and attention heatmaps, addressing issues like state representation gaps, reward biases, and exploration deficiencies
  • Design curriculum learning mechanisms with progressive difficulty strategies to solve policy degradation in sparse reward scenarios
  • Innovate state and reward mechanisms by building heterogeneous feature fusion models integrating LSTM for user behavior sequences and meta-learning for preferences
  • Design composite reward functions combining dense (clicks) and sparse (purchases) rewards, incorporating KL-divergence-based reward shaping
  • Track and apply latest advancements in deep learning, computational advertising, recommendation systems, and technologies like DeepSeek to multi-objective ranking

Target Your Resume for "腾讯广告-算法工程师-强化学习方向" , Tencent

Get personalized recommendations to optimize your resume specifically for 腾讯广告-算法工程师-强化学习方向. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "腾讯广告-算法工程师-强化学习方向" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaCDGCDG

Answer 10 quick questions to check your fit for 腾讯广告-算法工程师-强化学习方向 @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.