Resume and JobRESUME AND JOB
Tencent logo

大模型管线数据工程师(深圳/北京)

Tencent

Software and Technology Jobs

大模型管线数据工程师(深圳/北京)

full-timePosted: Nov 12, 2025

Job Description

大模型管线数据工程师(深圳/北京)

📋 Job Overview

The role involves designing and implementing an efficient data processing platform for large model pre-training and post-training data pipelines at Tencent. Responsibilities include optimizing computation and storage to enhance platform capacity and performance, building scalable dataset management systems, and documenting best practices. This position is based in Shenzhen or Beijing, focusing on balancing usability and efficiency in handling massive data volumes.

📍 Location: Shenzhen, China

🏢 Business Unit: TEG

📄 Full Description

1.面向大模型预训练、后训练数据管线,设计并实现高效的数据处理平台。单管线上,通过算子编排形成数据计算、存储、一体化符合大模型训练的管线平台,平台级别上,通过存储、计算优化实现平台产能提升;
2.计算方向,提升平台级别计算效率,通过海量数据、任务、资源、合理化系统设计,抽象,对各个可编排算子的合并、拆分,达成易用性和计算性能平衡。对热点的算子,考虑单点优化以及公共服务的方式达到平台级性能提升;
3.存储方向,构建服务于整个预训练和后训练的dataset,优化海量存储管理与访问方案(对象存储分层、冷热分层、缓存策略、数据压缩与列式格式优化、读写并发控制、成本与生命周期管理);
4.编写技术文档、最佳实践与性能评估报告,推动能力沉淀与工具链升级。

🎯 Key Responsibilities

  • Design and implement efficient data processing platforms for large model pre-training and post-training data pipelines, integrating data computation, storage, and orchestration.
  • Optimize platform-level computation efficiency through system design for massive data, tasks, and resources, including operator merging, splitting, and hotspot optimizations.
  • Build dataset services for pre-training and post-training, optimizing massive storage management and access schemes such as object storage layering, cold-hot separation, caching, compression, columnar formats, concurrency control, and lifecycle management.
  • Write technical documentation, best practices, and performance evaluation reports to promote capability accumulation and toolchain upgrades.

🛠️ Required Skills

  • Expertise in data pipeline design and implementation for AI large models
  • Skills in computation optimization, including operator orchestration and performance tuning
  • Knowledge of storage systems, including object storage, caching strategies, data compression, and concurrency control
  • Ability to document technical practices and conduct performance evaluations

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 600,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Expertise in data pipeline design and implementation for AI large modelsintermediate
  • Skills in computation optimization, including operator orchestration and performance tuningintermediate
  • Knowledge of storage systems, including object storage, caching strategies, data compression, and concurrency controlintermediate
  • Ability to document technical practices and conduct performance evaluationsintermediate

Responsibilities

  • Design and implement efficient data processing platforms for large model pre-training and post-training data pipelines, integrating data computation, storage, and orchestration.
  • Optimize platform-level computation efficiency through system design for massive data, tasks, and resources, including operator merging, splitting, and hotspot optimizations.
  • Build dataset services for pre-training and post-training, optimizing massive storage management and access schemes such as object storage layering, cold-hot separation, caching, compression, columnar formats, concurrency control, and lifecycle management.
  • Write technical documentation, best practices, and performance evaluation reports to promote capability accumulation and toolchain upgrades.

Target Your Resume for "大模型管线数据工程师(深圳/北京)" , Tencent

Get personalized recommendations to optimize your resume specifically for 大模型管线数据工程师(深圳/北京). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "大模型管线数据工程师(深圳/北京)" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaTEGTEG

Answer 10 quick questions to check your fit for 大模型管线数据工程师(深圳/北京) @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Tencent logo

大模型管线数据工程师(深圳/北京)

Tencent

Software and Technology Jobs

大模型管线数据工程师(深圳/北京)

full-timePosted: Nov 12, 2025

Job Description

大模型管线数据工程师(深圳/北京)

📋 Job Overview

The role involves designing and implementing an efficient data processing platform for large model pre-training and post-training data pipelines at Tencent. Responsibilities include optimizing computation and storage to enhance platform capacity and performance, building scalable dataset management systems, and documenting best practices. This position is based in Shenzhen or Beijing, focusing on balancing usability and efficiency in handling massive data volumes.

📍 Location: Shenzhen, China

🏢 Business Unit: TEG

📄 Full Description

1.面向大模型预训练、后训练数据管线,设计并实现高效的数据处理平台。单管线上,通过算子编排形成数据计算、存储、一体化符合大模型训练的管线平台,平台级别上,通过存储、计算优化实现平台产能提升;
2.计算方向,提升平台级别计算效率,通过海量数据、任务、资源、合理化系统设计,抽象,对各个可编排算子的合并、拆分,达成易用性和计算性能平衡。对热点的算子,考虑单点优化以及公共服务的方式达到平台级性能提升;
3.存储方向,构建服务于整个预训练和后训练的dataset,优化海量存储管理与访问方案(对象存储分层、冷热分层、缓存策略、数据压缩与列式格式优化、读写并发控制、成本与生命周期管理);
4.编写技术文档、最佳实践与性能评估报告,推动能力沉淀与工具链升级。

🎯 Key Responsibilities

  • Design and implement efficient data processing platforms for large model pre-training and post-training data pipelines, integrating data computation, storage, and orchestration.
  • Optimize platform-level computation efficiency through system design for massive data, tasks, and resources, including operator merging, splitting, and hotspot optimizations.
  • Build dataset services for pre-training and post-training, optimizing massive storage management and access schemes such as object storage layering, cold-hot separation, caching, compression, columnar formats, concurrency control, and lifecycle management.
  • Write technical documentation, best practices, and performance evaluation reports to promote capability accumulation and toolchain upgrades.

🛠️ Required Skills

  • Expertise in data pipeline design and implementation for AI large models
  • Skills in computation optimization, including operator orchestration and performance tuning
  • Knowledge of storage systems, including object storage, caching strategies, data compression, and concurrency control
  • Ability to document technical practices and conduct performance evaluations

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 600,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Expertise in data pipeline design and implementation for AI large modelsintermediate
  • Skills in computation optimization, including operator orchestration and performance tuningintermediate
  • Knowledge of storage systems, including object storage, caching strategies, data compression, and concurrency controlintermediate
  • Ability to document technical practices and conduct performance evaluationsintermediate

Responsibilities

  • Design and implement efficient data processing platforms for large model pre-training and post-training data pipelines, integrating data computation, storage, and orchestration.
  • Optimize platform-level computation efficiency through system design for massive data, tasks, and resources, including operator merging, splitting, and hotspot optimizations.
  • Build dataset services for pre-training and post-training, optimizing massive storage management and access schemes such as object storage layering, cold-hot separation, caching, compression, columnar formats, concurrency control, and lifecycle management.
  • Write technical documentation, best practices, and performance evaluation reports to promote capability accumulation and toolchain upgrades.

Target Your Resume for "大模型管线数据工程师(深圳/北京)" , Tencent

Get personalized recommendations to optimize your resume specifically for 大模型管线数据工程师(深圳/北京). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "大模型管线数据工程师(深圳/北京)" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaTEGTEG

Answer 10 quick questions to check your fit for 大模型管线数据工程师(深圳/北京) @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.