RESUME AND JOB

腾讯广告-算法工程师-强化学习方向

Tencent

腾讯广告-算法工程师-强化学习方向

Tencent

full-timePosted: Dec 1, 2025

Job Description

腾讯广告-算法工程师-强化学习方向

📋 Job Overview

The Algorithm Engineer position in Tencent Advertising focuses on developing and optimizing multi-objective reinforcement learning algorithms for advertising scenarios. Responsibilities include building improved frameworks for algorithms like DQN, PPO, and SAC, analyzing bottlenecks with interpretability tools, and innovating state and reward mechanisms. The role also involves tracking advancements in deep learning, computational advertising, recommendation systems, and applying them to multi-objective ranking.

📍 Location: Shenzhen, China

🏢 Business Unit: CDG

📄 Full Description

1.多目标强化学习算法开发与调优。基于业务场景构建DQN、PPO、SAC等算法的改进框架，针对延迟奖励稀疏性设计分层强化学习架构。搭建离线仿真环境与在线AB测试闭环，设计动态滑动窗口评估机制，量化算法迭代效果；
2.效果瓶颈分析与突破。构建强化学习可解释性分析工具（如SHAP值、注意力热力图），定位状态表征缺失/奖励函数偏差/探索不足等瓶颈。设计课程学习机制，通过渐进式难度提升策略解决稀疏奖励场景下的策略退化问题；
3.状态与奖励机制创新。构建异构特征融合模型，集成用户实时行为序列（LSTM）、跨场景偏好迁移（Meta Learning）等高阶状态表征。设计复合奖励函数，融合稠密奖励（点击行为）与稀疏奖励（购买行为），引入基于KL散度的奖励塑形技术；
4.跟踪深度学习、计算广告、推荐系统，deepseek等最新前沿技术，应用到多目标排序。

🎯 Key Responsibilities

Develop and optimize multi-objective reinforcement learning algorithms, building improved frameworks for DQN, PPO, SAC, and designing hierarchical architectures for sparse delayed rewards
Build offline simulation environments, online A/B testing loops, and dynamic sliding window evaluation mechanisms to quantify algorithm iterations
Analyze and break through effect bottlenecks using interpretability tools like SHAP values and attention heatmaps, addressing issues like state representation gaps, reward biases, and exploration deficiencies
Design curriculum learning mechanisms with progressive difficulty strategies to solve policy degradation in sparse reward scenarios
Innovate state and reward mechanisms by building heterogeneous feature fusion models integrating LSTM for user behavior sequences and meta-learning for preferences
Design composite reward functions combining dense (clicks) and sparse (purchases) rewards, incorporating KL-divergence-based reward shaping
Track and apply latest advancements in deep learning, computational advertising, recommendation systems, and technologies like DeepSeek to multi-objective ranking

✅ Required Qualifications

Experience in reinforcement learning algorithm development
Knowledge of DQN, PPO, SAC algorithms
Ability to build offline simulation environments and online A/B testing loops
Skills in analyzing and breaking through effect bottlenecks in RL

⭐ Preferred Qualifications

Expertise in designing hierarchical RL architectures for sparse delayed rewards
Familiarity with dynamic sliding window evaluation mechanisms
Experience with curriculum learning for sparse reward scenarios
Knowledge of meta-learning for cross-scenario preference transfer

🛠️ Required Skills

Reinforcement learning (DQN, PPO, SAC, hierarchical RL)
Deep learning and neural networks (LSTM, meta-learning)
Interpretability tools (SHAP, attention heatmaps)
A/B testing and simulation environments
Reward shaping and curriculum learning
Feature fusion and state representation
Knowledge of computational advertising and recommendation systems

Locations

Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Reinforcement learning (DQN, PPO, SAC, hierarchical RL)intermediate
Deep learning and neural networks (LSTM, meta-learning)intermediate
Interpretability tools (SHAP, attention heatmaps)intermediate
A/B testing and simulation environmentsintermediate
Reward shaping and curriculum learningintermediate
Feature fusion and state representationintermediate
Knowledge of computational advertising and recommendation systemsintermediate

Required Qualifications

Experience in reinforcement learning algorithm development (experience)
Knowledge of DQN, PPO, SAC algorithms (experience)
Ability to build offline simulation environments and online A/B testing loops (experience)
Skills in analyzing and breaking through effect bottlenecks in RL (experience)

Preferred Qualifications

Expertise in designing hierarchical RL architectures for sparse delayed rewards (experience)
Familiarity with dynamic sliding window evaluation mechanisms (experience)
Experience with curriculum learning for sparse reward scenarios (experience)
Knowledge of meta-learning for cross-scenario preference transfer (experience)

Responsibilities

Develop and optimize multi-objective reinforcement learning algorithms, building improved frameworks for DQN, PPO, SAC, and designing hierarchical architectures for sparse delayed rewards
Build offline simulation environments, online A/B testing loops, and dynamic sliding window evaluation mechanisms to quantify algorithm iterations
Analyze and break through effect bottlenecks using interpretability tools like SHAP values and attention heatmaps, addressing issues like state representation gaps, reward biases, and exploration deficiencies
Design curriculum learning mechanisms with progressive difficulty strategies to solve policy degradation in sparse reward scenarios
Innovate state and reward mechanisms by building heterogeneous feature fusion models integrating LSTM for user behavior sequences and meta-learning for preferences
Design composite reward functions combining dense (clicks) and sparse (purchases) rewards, incorporating KL-divergence-based reward shaping
Track and apply latest advancements in deep learning, computational advertising, recommendation systems, and technologies like DeepSeek to multi-objective ranking

Target Your Resume for "腾讯广告-算法工程师-强化学习方向" , Tencent

Get personalized recommendations to optimize your resume specifically for 腾讯广告-算法工程师-强化学习方向. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "腾讯广告-算法工程师-强化学习方向" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

TencentShenzhenChinaCDGCDG

Answer 10 quick questions to check your fit for 腾讯广告-算法工程师-强化学习方向 @ Tencent.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

腾讯广告-算法工程师-强化学习方向

Tencent

腾讯广告-算法工程师-强化学习方向

Tencent

full-timePosted: Dec 1, 2025

Job Description

腾讯广告-算法工程师-强化学习方向

📋 Job Overview

📍 Location: Shenzhen, China

🏢 Business Unit: CDG

📄 Full Description

🎯 Key Responsibilities

Develop and optimize multi-objective reinforcement learning algorithms, building improved frameworks for DQN, PPO, SAC, and designing hierarchical architectures for sparse delayed rewards
Build offline simulation environments, online A/B testing loops, and dynamic sliding window evaluation mechanisms to quantify algorithm iterations
Analyze and break through effect bottlenecks using interpretability tools like SHAP values and attention heatmaps, addressing issues like state representation gaps, reward biases, and exploration deficiencies
Design curriculum learning mechanisms with progressive difficulty strategies to solve policy degradation in sparse reward scenarios
Innovate state and reward mechanisms by building heterogeneous feature fusion models integrating LSTM for user behavior sequences and meta-learning for preferences
Design composite reward functions combining dense (clicks) and sparse (purchases) rewards, incorporating KL-divergence-based reward shaping
Track and apply latest advancements in deep learning, computational advertising, recommendation systems, and technologies like DeepSeek to multi-objective ranking

✅ Required Qualifications

Experience in reinforcement learning algorithm development
Knowledge of DQN, PPO, SAC algorithms
Ability to build offline simulation environments and online A/B testing loops
Skills in analyzing and breaking through effect bottlenecks in RL

⭐ Preferred Qualifications

Expertise in designing hierarchical RL architectures for sparse delayed rewards
Familiarity with dynamic sliding window evaluation mechanisms
Experience with curriculum learning for sparse reward scenarios
Knowledge of meta-learning for cross-scenario preference transfer

🛠️ Required Skills

Reinforcement learning (DQN, PPO, SAC, hierarchical RL)
Deep learning and neural networks (LSTM, meta-learning)
Interpretability tools (SHAP, attention heatmaps)
A/B testing and simulation environments
Reward shaping and curriculum learning
Feature fusion and state representation
Knowledge of computational advertising and recommendation systems

Locations

Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 800,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Reinforcement learning (DQN, PPO, SAC, hierarchical RL)intermediate
Deep learning and neural networks (LSTM, meta-learning)intermediate
Interpretability tools (SHAP, attention heatmaps)intermediate
A/B testing and simulation environmentsintermediate
Reward shaping and curriculum learningintermediate
Feature fusion and state representationintermediate
Knowledge of computational advertising and recommendation systemsintermediate

Required Qualifications

Experience in reinforcement learning algorithm development (experience)
Knowledge of DQN, PPO, SAC algorithms (experience)
Ability to build offline simulation environments and online A/B testing loops (experience)
Skills in analyzing and breaking through effect bottlenecks in RL (experience)

Preferred Qualifications

Expertise in designing hierarchical RL architectures for sparse delayed rewards (experience)
Familiarity with dynamic sliding window evaluation mechanisms (experience)
Experience with curriculum learning for sparse reward scenarios (experience)
Knowledge of meta-learning for cross-scenario preference transfer (experience)

Responsibilities

Develop and optimize multi-objective reinforcement learning algorithms, building improved frameworks for DQN, PPO, SAC, and designing hierarchical architectures for sparse delayed rewards
Build offline simulation environments, online A/B testing loops, and dynamic sliding window evaluation mechanisms to quantify algorithm iterations
Analyze and break through effect bottlenecks using interpretability tools like SHAP values and attention heatmaps, addressing issues like state representation gaps, reward biases, and exploration deficiencies
Design curriculum learning mechanisms with progressive difficulty strategies to solve policy degradation in sparse reward scenarios
Innovate state and reward mechanisms by building heterogeneous feature fusion models integrating LSTM for user behavior sequences and meta-learning for preferences
Design composite reward functions combining dense (clicks) and sparse (purchases) rewards, incorporating KL-divergence-based reward shaping
Track and apply latest advancements in deep learning, computational advertising, recommendation systems, and technologies like DeepSeek to multi-objective ranking

Target Your Resume for "腾讯广告-算法工程师-强化学习方向" , Tencent

Get personalized recommendations to optimize your resume specifically for 腾讯广告-算法工程师-强化学习方向. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "腾讯广告-算法工程师-强化学习方向" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

TencentShenzhenChinaCDGCDG

Answer 10 quick questions to check your fit for 腾讯广告-算法工程师-强化学习方向 @ Tencent.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap