Resume and JobRESUME AND JOB
Tencent logo

元宝-大模型评测产品经理

Tencent

Software and Technology Jobs

元宝-大模型评测产品经理

full-timePosted: Nov 7, 2025

Job Description

元宝-大模型评测产品经理

📋 Job Overview

The Product Manager for Yuanbao Large Model Evaluation is responsible for designing and building automated evaluation systems for large language models, covering general and specialized capabilities like reasoning, writing, speech, and VLM. The role involves researching benchmarks, analyzing model performance, optimizing based on user data, and collaborating across teams to integrate evaluation into product development and model iteration. This position drives standardization, reproducibility, and continuous improvement in model quality and user experience.

📍 Location: Shenzhen, China

🏢 Business Unit: CSIG

📄 Full Description

1.自动评估体系建设:设计并搭建大模型自动评估体系,覆盖通用能力及专项能力(如推理、写作、语音、VLM 等);构建评测指标体系与自动化评测流程,推动评测体系标准化、模块化、可扩展化;
2.Benchmark 研究与落地:跟踪国内外前沿大模型评测方法与 Benchmark,研究其评测维度与自动化机制;复现与改造高质量评测集,结合业务场景定制评测任务,确保评测的真实性与可复现性;
3.模型效果分析与策略优化:定期对不同版本模型进行系统评测与对比,输出详细分析报告,识别模型优势与薄弱点;针对记忆、写作、语音、多模态(VLM)等专项能力,设计细粒度评测指标与分析策略,支撑模型迭代方向;
4.用户数据分析与体验优化:持续监测与分析真实用户交互数据,挖掘模型表现的 badcase 与典型问题;与算法及产品团队协作,将用户侧问题转化为可量化的评测指标与优化方案;建立用户体验反馈与评测体系联动机制,推动模型效果持续提升与体验闭环优化;
5.跨团队协作与产品规划:与算法、工程、产品团队紧密配合,定义评测需求与指标体系,推动自动评测体系在实际业务与模型研发中的落地,形成评测 → 分析 → 优化的完整闭环。

🎯 Key Responsibilities

  • Design and build automated evaluation systems for large models, covering general and specialized capabilities (e.g., reasoning, writing, speech, VLM); construct evaluation metric systems and automated processes to promote standardization, modularity, and scalability.
  • Track domestic and international frontier large model evaluation methods and benchmarks, research evaluation dimensions and automation mechanisms; reproduce and modify high-quality evaluation sets, customize evaluation tasks for business scenarios to ensure authenticity and reproducibility.
  • Conduct systematic evaluations and comparisons of different model versions, output detailed analysis reports, identify model strengths and weaknesses; design fine-grained evaluation metrics and analysis strategies for specialized capabilities like memory, writing, speech, and multimodal (VLM) to support model iteration directions.
  • Continuously monitor and analyze real user interaction data, mine bad cases and typical issues in model performance; collaborate with algorithm and product teams to transform user-side problems into quantifiable evaluation metrics and optimization solutions; establish linkage mechanisms between user experience feedback and evaluation systems to drive continuous model improvement and experience closed-loop optimization.
  • Closely collaborate with algorithm, engineering, and product teams to define evaluation requirements and metric systems, promote the implementation of automated evaluation systems in actual business and model R&D, forming a complete closed loop of evaluation → analysis → optimization.

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 600,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Responsibilities

  • Design and build automated evaluation systems for large models, covering general and specialized capabilities (e.g., reasoning, writing, speech, VLM); construct evaluation metric systems and automated processes to promote standardization, modularity, and scalability.
  • Track domestic and international frontier large model evaluation methods and benchmarks, research evaluation dimensions and automation mechanisms; reproduce and modify high-quality evaluation sets, customize evaluation tasks for business scenarios to ensure authenticity and reproducibility.
  • Conduct systematic evaluations and comparisons of different model versions, output detailed analysis reports, identify model strengths and weaknesses; design fine-grained evaluation metrics and analysis strategies for specialized capabilities like memory, writing, speech, and multimodal (VLM) to support model iteration directions.
  • Continuously monitor and analyze real user interaction data, mine bad cases and typical issues in model performance; collaborate with algorithm and product teams to transform user-side problems into quantifiable evaluation metrics and optimization solutions; establish linkage mechanisms between user experience feedback and evaluation systems to drive continuous model improvement and experience closed-loop optimization.
  • Closely collaborate with algorithm, engineering, and product teams to define evaluation requirements and metric systems, promote the implementation of automated evaluation systems in actual business and model R&D, forming a complete closed loop of evaluation → analysis → optimization.

Target Your Resume for "元宝-大模型评测产品经理" , Tencent

Get personalized recommendations to optimize your resume specifically for 元宝-大模型评测产品经理. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "元宝-大模型评测产品经理" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaCSIGCSIG

Answer 10 quick questions to check your fit for 元宝-大模型评测产品经理 @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Tencent logo

元宝-大模型评测产品经理

Tencent

Software and Technology Jobs

元宝-大模型评测产品经理

full-timePosted: Nov 7, 2025

Job Description

元宝-大模型评测产品经理

📋 Job Overview

The Product Manager for Yuanbao Large Model Evaluation is responsible for designing and building automated evaluation systems for large language models, covering general and specialized capabilities like reasoning, writing, speech, and VLM. The role involves researching benchmarks, analyzing model performance, optimizing based on user data, and collaborating across teams to integrate evaluation into product development and model iteration. This position drives standardization, reproducibility, and continuous improvement in model quality and user experience.

📍 Location: Shenzhen, China

🏢 Business Unit: CSIG

📄 Full Description

1.自动评估体系建设:设计并搭建大模型自动评估体系,覆盖通用能力及专项能力(如推理、写作、语音、VLM 等);构建评测指标体系与自动化评测流程,推动评测体系标准化、模块化、可扩展化;
2.Benchmark 研究与落地:跟踪国内外前沿大模型评测方法与 Benchmark,研究其评测维度与自动化机制;复现与改造高质量评测集,结合业务场景定制评测任务,确保评测的真实性与可复现性;
3.模型效果分析与策略优化:定期对不同版本模型进行系统评测与对比,输出详细分析报告,识别模型优势与薄弱点;针对记忆、写作、语音、多模态(VLM)等专项能力,设计细粒度评测指标与分析策略,支撑模型迭代方向;
4.用户数据分析与体验优化:持续监测与分析真实用户交互数据,挖掘模型表现的 badcase 与典型问题;与算法及产品团队协作,将用户侧问题转化为可量化的评测指标与优化方案;建立用户体验反馈与评测体系联动机制,推动模型效果持续提升与体验闭环优化;
5.跨团队协作与产品规划:与算法、工程、产品团队紧密配合,定义评测需求与指标体系,推动自动评测体系在实际业务与模型研发中的落地,形成评测 → 分析 → 优化的完整闭环。

🎯 Key Responsibilities

  • Design and build automated evaluation systems for large models, covering general and specialized capabilities (e.g., reasoning, writing, speech, VLM); construct evaluation metric systems and automated processes to promote standardization, modularity, and scalability.
  • Track domestic and international frontier large model evaluation methods and benchmarks, research evaluation dimensions and automation mechanisms; reproduce and modify high-quality evaluation sets, customize evaluation tasks for business scenarios to ensure authenticity and reproducibility.
  • Conduct systematic evaluations and comparisons of different model versions, output detailed analysis reports, identify model strengths and weaknesses; design fine-grained evaluation metrics and analysis strategies for specialized capabilities like memory, writing, speech, and multimodal (VLM) to support model iteration directions.
  • Continuously monitor and analyze real user interaction data, mine bad cases and typical issues in model performance; collaborate with algorithm and product teams to transform user-side problems into quantifiable evaluation metrics and optimization solutions; establish linkage mechanisms between user experience feedback and evaluation systems to drive continuous model improvement and experience closed-loop optimization.
  • Closely collaborate with algorithm, engineering, and product teams to define evaluation requirements and metric systems, promote the implementation of automated evaluation systems in actual business and model R&D, forming a complete closed loop of evaluation → analysis → optimization.

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 600,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Responsibilities

  • Design and build automated evaluation systems for large models, covering general and specialized capabilities (e.g., reasoning, writing, speech, VLM); construct evaluation metric systems and automated processes to promote standardization, modularity, and scalability.
  • Track domestic and international frontier large model evaluation methods and benchmarks, research evaluation dimensions and automation mechanisms; reproduce and modify high-quality evaluation sets, customize evaluation tasks for business scenarios to ensure authenticity and reproducibility.
  • Conduct systematic evaluations and comparisons of different model versions, output detailed analysis reports, identify model strengths and weaknesses; design fine-grained evaluation metrics and analysis strategies for specialized capabilities like memory, writing, speech, and multimodal (VLM) to support model iteration directions.
  • Continuously monitor and analyze real user interaction data, mine bad cases and typical issues in model performance; collaborate with algorithm and product teams to transform user-side problems into quantifiable evaluation metrics and optimization solutions; establish linkage mechanisms between user experience feedback and evaluation systems to drive continuous model improvement and experience closed-loop optimization.
  • Closely collaborate with algorithm, engineering, and product teams to define evaluation requirements and metric systems, promote the implementation of automated evaluation systems in actual business and model R&D, forming a complete closed loop of evaluation → analysis → optimization.

Target Your Resume for "元宝-大模型评测产品经理" , Tencent

Get personalized recommendations to optimize your resume specifically for 元宝-大模型评测产品经理. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "元宝-大模型评测产品经理" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaCSIGCSIG

Answer 10 quick questions to check your fit for 元宝-大模型评测产品经理 @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.