RESUME AND JOB

Multimodal Reinforcement Learning Post-Training Algorithm Expert

Tencent

Multimodal Reinforcement Learning Post-Training Algorithm Expert

Tencent

internshipPosted: Nov 25, 2025

Job Description

Multimodal Reinforcement Learning Post-Training Algorithm Expert

📋 Job Overview

Tencent's Technology Engineering Group (TEG) is seeking a Multimodal Reinforcement Learning Post-Training Algorithm Expert to bridge algorithm and framework teams in developing advanced AI technologies. The role involves co-designing algorithms and frameworks, optimizing training pipelines, resolving technical bottlenecks, and fostering cross-team collaboration. This position is based in Singapore and focuses on enhancing multimodal large models through innovative reinforcement learning techniques.

📍 Location: CapitaSky, Singapore

🏢 Business Unit: TEG

📄 Full Description

Business Unit
Technology Engineering Group (TEG) is responsible for supporting the company and its business groups on technology and operational platforms, as well as the construction and operation of R&D management and data centers, TEG provides users with a full range of customer services. As the operator of the largest networking, devices, and data center in Asia,TEG also leads the Tencent Technology Committee in strengthening infrastructure R&D through internal and distributed open source collaboration, constructing new platforms and supporting business innovation.

What the Role Entails
Algorithm-Framework Co-design: Act as a technical bridge between the algorithm and framework teams. Deeply understand the principles and evolution trends of post-training algorithms for multimodal large models (e.g., RLHF, DPO, Curriculum Reinforcement Learning) and translate these into functional requirements for the underlying frameworks, providing insights for framework architecture design
Training Pipeline Optimization and Evaluation: Lead or deeply participate in the setup, optimization, and effectiveness evaluation of post-training pipelines (e.g., multimodal SFT, RLHF). Focus on training stability, efficiency, and generalization capability, particularly proposing systematic improvements for areas like cross-modal alignment, reward function design, and policy optimization
Technical Research and Bottleneck Resolution: Proactively track cutting-edge advancements in multimodal reinforcement learning post-training from academia and industry. Perform root cause analysis for training bottlenecks (e.g., insufficient OOD generalization, modality fusion conflicts) and collaborate with the framework team to develop and implement solutions
Cross-team Support and Knowledge Sharing: Collaborate efficiently with framework development, hardware optimization, and business algorithm teams to ensure the implementation of technical solutions. Produce high-quality technical documentation, design drafts, and experimental reports. Organize internal sharing sessions to enhance the overall technical expertise of the team

Who We Look For
Education and Technical Background: A Master's degree or higher in Computer Science, Artificial Intelligence, Electronic Engineering, Automation, or related fields. A solid foundation in machine learning/deep learning, with a deep understanding of multimodal large models and the reinforcement learning post-training technology stack
Core Algorithm and Engineering Skills:
Proficiency in Python programming and familiarity with deep learning frameworks like PyTorch.
Deep understanding of model architectures such as Transformer and Diffusion
Thorough comprehension of the principles, processes, and common challenges (e.g., training instability, reward hacking) of post-training algorithms like SFT, RLHF, and DPO
Strong engineering implementation and debugging skills, capable of rapidly validating algorithmic ideas and conducting rigorous experimental analysis for performance evaluation
Framework Collaboration and System Perspective:
Familiarity with at least one mainstream large model training/inference framework (e.g., Megatron-LM, DeepSpeed, VLLM) and an understanding of their architectural design principles
Ability to assess framework usability, scalability, and performance from an algorithmic perspective and propose improvement suggestions. Experience with post-training frameworks like VERL or OpenRLHF is a plus
Soft Skills: Excellent cross-team communication skills, able to clearly translate requirements and articulate solutions between algorithm and engineering teams. A strong sense of responsibility, self-motivation, and passion for solving complex problems

Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Work Location: Singapore-CapitaSky

🎯 Key Responsibilities

Act as a technical bridge between the algorithm and framework teams, translating post-training algorithm principles (e.g., RLHF, DPO, Curriculum Reinforcement Learning) into functional requirements for frameworks
Lead or participate in the setup, optimization, and evaluation of post-training pipelines (e.g., multimodal SFT, RLHF), focusing on stability, efficiency, and generalization
Proactively track advancements in multimodal reinforcement learning post-training, perform root cause analysis for bottlenecks (e.g., OOD generalization, modality fusion conflicts), and collaborate on solutions
Collaborate with framework development, hardware optimization, and business algorithm teams; produce technical documentation and organize internal sharing sessions

✅ Required Qualifications

A Master's degree or higher in Computer Science, Artificial Intelligence, Electronic Engineering, Automation, or related fields
A solid foundation in machine learning/deep learning
Deep understanding of multimodal large models and the reinforcement learning post-training technology stack

⭐ Preferred Qualifications

Experience with post-training frameworks like VERL or OpenRLHF

🛠️ Required Skills

Proficiency in Python programming and familiarity with deep learning frameworks like PyTorch
Deep understanding of model architectures such as Transformer and Diffusion
Thorough comprehension of post-training algorithms like SFT, RLHF, and DPO, including principles, processes, and challenges (e.g., training instability, reward hacking)
Strong engineering implementation and debugging skills for validating ideas and conducting experimental analysis
Familiarity with at least one mainstream large model training/inference framework (e.g., Megatron-LM, DeepSpeed, VLLM) and understanding of their architectural principles
Ability to assess framework usability, scalability, and performance from an algorithmic perspective and propose improvements
Excellent cross-team communication skills to translate requirements and articulate solutions
Strong sense of responsibility, self-motivation, and passion for solving complex problems

🎁 Benefits

Equal opportunity employer fostering diverse voices and innovation
Environment where employees feel supported and inspired to achieve goals
Work location in Singapore-CapitaSky

Locations

CapitaSky, Singapore

Salary

Estimated Salary Rangemedium confidence

180,000 - 300,000 SGD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Proficiency in Python programming and familiarity with deep learning frameworks like PyTorchintermediate
Deep understanding of model architectures such as Transformer and Diffusionintermediate
Thorough comprehension of post-training algorithms like SFT, RLHF, and DPO, including principles, processes, and challenges (e.g., training instability, reward hacking)intermediate
Strong engineering implementation and debugging skills for validating ideas and conducting experimental analysisintermediate
Familiarity with at least one mainstream large model training/inference framework (e.g., Megatron-LM, DeepSpeed, VLLM) and understanding of their architectural principlesintermediate
Ability to assess framework usability, scalability, and performance from an algorithmic perspective and propose improvementsintermediate
Excellent cross-team communication skills to translate requirements and articulate solutionsintermediate
Strong sense of responsibility, self-motivation, and passion for solving complex problemsintermediate

Required Qualifications

A Master's degree or higher in Computer Science, Artificial Intelligence, Electronic Engineering, Automation, or related fields (experience)
A solid foundation in machine learning/deep learning (experience)
Deep understanding of multimodal large models and the reinforcement learning post-training technology stack (experience)

Preferred Qualifications

Experience with post-training frameworks like VERL or OpenRLHF (experience)

Responsibilities

Act as a technical bridge between the algorithm and framework teams, translating post-training algorithm principles (e.g., RLHF, DPO, Curriculum Reinforcement Learning) into functional requirements for frameworks
Lead or participate in the setup, optimization, and evaluation of post-training pipelines (e.g., multimodal SFT, RLHF), focusing on stability, efficiency, and generalization
Proactively track advancements in multimodal reinforcement learning post-training, perform root cause analysis for bottlenecks (e.g., OOD generalization, modality fusion conflicts), and collaborate on solutions
Collaborate with framework development, hardware optimization, and business algorithm teams; produce technical documentation and organize internal sharing sessions

Benefits

general: Equal opportunity employer fostering diverse voices and innovation
general: Environment where employees feel supported and inspired to achieve goals
general: Work location in Singapore-CapitaSky

Target Your Resume for "Multimodal Reinforcement Learning Post-Training Algorithm Expert" , Tencent

Get personalized recommendations to optimize your resume specifically for Multimodal Reinforcement Learning Post-Training Algorithm Expert. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Multimodal Reinforcement Learning Post-Training Algorithm Expert" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

TencentCapitaSkySingaporeTEGTEG

Answer 10 quick questions to check your fit for Multimodal Reinforcement Learning Post-Training Algorithm Expert @ Tencent.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Multimodal Reinforcement Learning Post-Training Algorithm Expert

Tencent

Multimodal Reinforcement Learning Post-Training Algorithm Expert

Tencent

internshipPosted: Nov 25, 2025

Job Description

Multimodal Reinforcement Learning Post-Training Algorithm Expert

📋 Job Overview

📍 Location: CapitaSky, Singapore

🏢 Business Unit: TEG

📄 Full Description

🎯 Key Responsibilities

Act as a technical bridge between the algorithm and framework teams, translating post-training algorithm principles (e.g., RLHF, DPO, Curriculum Reinforcement Learning) into functional requirements for frameworks
Lead or participate in the setup, optimization, and evaluation of post-training pipelines (e.g., multimodal SFT, RLHF), focusing on stability, efficiency, and generalization
Proactively track advancements in multimodal reinforcement learning post-training, perform root cause analysis for bottlenecks (e.g., OOD generalization, modality fusion conflicts), and collaborate on solutions
Collaborate with framework development, hardware optimization, and business algorithm teams; produce technical documentation and organize internal sharing sessions

✅ Required Qualifications

A Master's degree or higher in Computer Science, Artificial Intelligence, Electronic Engineering, Automation, or related fields
A solid foundation in machine learning/deep learning
Deep understanding of multimodal large models and the reinforcement learning post-training technology stack

⭐ Preferred Qualifications

Experience with post-training frameworks like VERL or OpenRLHF

🛠️ Required Skills

Proficiency in Python programming and familiarity with deep learning frameworks like PyTorch
Deep understanding of model architectures such as Transformer and Diffusion
Thorough comprehension of post-training algorithms like SFT, RLHF, and DPO, including principles, processes, and challenges (e.g., training instability, reward hacking)
Strong engineering implementation and debugging skills for validating ideas and conducting experimental analysis
Familiarity with at least one mainstream large model training/inference framework (e.g., Megatron-LM, DeepSpeed, VLLM) and understanding of their architectural principles
Ability to assess framework usability, scalability, and performance from an algorithmic perspective and propose improvements
Excellent cross-team communication skills to translate requirements and articulate solutions
Strong sense of responsibility, self-motivation, and passion for solving complex problems

🎁 Benefits

Equal opportunity employer fostering diverse voices and innovation
Environment where employees feel supported and inspired to achieve goals
Work location in Singapore-CapitaSky

Locations

CapitaSky, Singapore

Salary

Estimated Salary Rangemedium confidence

180,000 - 300,000 SGD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Proficiency in Python programming and familiarity with deep learning frameworks like PyTorchintermediate
Deep understanding of model architectures such as Transformer and Diffusionintermediate
Thorough comprehension of post-training algorithms like SFT, RLHF, and DPO, including principles, processes, and challenges (e.g., training instability, reward hacking)intermediate
Strong engineering implementation and debugging skills for validating ideas and conducting experimental analysisintermediate
Familiarity with at least one mainstream large model training/inference framework (e.g., Megatron-LM, DeepSpeed, VLLM) and understanding of their architectural principlesintermediate
Ability to assess framework usability, scalability, and performance from an algorithmic perspective and propose improvementsintermediate
Excellent cross-team communication skills to translate requirements and articulate solutionsintermediate
Strong sense of responsibility, self-motivation, and passion for solving complex problemsintermediate

Required Qualifications

A Master's degree or higher in Computer Science, Artificial Intelligence, Electronic Engineering, Automation, or related fields (experience)
A solid foundation in machine learning/deep learning (experience)
Deep understanding of multimodal large models and the reinforcement learning post-training technology stack (experience)

Preferred Qualifications

Experience with post-training frameworks like VERL or OpenRLHF (experience)

Responsibilities

Act as a technical bridge between the algorithm and framework teams, translating post-training algorithm principles (e.g., RLHF, DPO, Curriculum Reinforcement Learning) into functional requirements for frameworks
Lead or participate in the setup, optimization, and evaluation of post-training pipelines (e.g., multimodal SFT, RLHF), focusing on stability, efficiency, and generalization
Proactively track advancements in multimodal reinforcement learning post-training, perform root cause analysis for bottlenecks (e.g., OOD generalization, modality fusion conflicts), and collaborate on solutions
Collaborate with framework development, hardware optimization, and business algorithm teams; produce technical documentation and organize internal sharing sessions

Benefits

general: Equal opportunity employer fostering diverse voices and innovation
general: Environment where employees feel supported and inspired to achieve goals
general: Work location in Singapore-CapitaSky

Target Your Resume for "Multimodal Reinforcement Learning Post-Training Algorithm Expert" , Tencent

Get personalized recommendations to optimize your resume specifically for Multimodal Reinforcement Learning Post-Training Algorithm Expert. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Multimodal Reinforcement Learning Post-Training Algorithm Expert" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

TencentCapitaSkySingaporeTEGTEG

Answer 10 quick questions to check your fit for Multimodal Reinforcement Learning Post-Training Algorithm Expert @ Tencent.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap