Resume and JobRESUME AND JOB
Tencent logo

混元大模型SRE运维工程师(北京)

Tencent

Engineering Jobs

混元大模型SRE运维工程师(北京)

full-timePosted: Dec 9, 2025

Job Description

混元大模型SRE运维工程师(北京)

📋 Job Overview

The role of SRE Operations Engineer for the Hunyuan Large Model in Beijing focuses on ensuring the stability and high availability of large model services under high concurrency and traffic. Responsibilities include designing monitoring and automation platforms, rapid fault resolution, and resource optimization to enhance efficiency and performance. The position involves proactive system analysis, cost management, and driving continuous improvements aligned with industry hardware trends.

📍 Location: Shenzhen, China

🏢 Business Unit: TEG

📄 Full Description

1.负责大模型服务的稳定性和高可用性,确保平台在高并发和大流量下的稳定运行,设计和实施监控、报警和自动化运维平台建设等,及时发现和解决问题;
2.负责故障的快速定位和修复,制定并执行应急预案,确保业务连续性,参与故障复盘,分析根本原因,提出改进措施,防止类似问题再次发生;
3.开发和维护自动化运维平台与工具,提高运维效率,减少人为操作失误。进行资源使用优化,提高资源利用率,提升系统性能;
4.分析和深入发掘现有系统的不足,数据驱动找到薄弱点,推动系统优化落地改进;
5.负责资源规划和管理,确保资源的合理分配和高效利用,进行资源成本分析,监控和评估资源使用情况,提出成本优化方案,同时能结合业界硬件演进roadmap与技术平台需求不断推动最优配置选型与迭代。

🎯 Key Responsibilities

  • Responsible for the stability and high availability of large model services, ensuring stable operation under high concurrency and large traffic; design and implement monitoring, alerting, and automated operations platforms to timely detect and resolve issues.
  • Responsible for quick fault location and repair, formulate and execute emergency plans to ensure business continuity; participate in post-mortem analysis, identify root causes, propose improvement measures to prevent recurrence.
  • Develop and maintain automated operations platforms and tools to improve efficiency and reduce human errors; optimize resource usage to increase utilization and enhance system performance.
  • Analyze and deeply explore deficiencies in existing systems, use data-driven approaches to identify weak points, and promote the implementation of optimizations and improvements.
  • Responsible for resource planning and management to ensure reasonable allocation and efficient utilization; conduct resource cost analysis, monitor and evaluate usage, propose cost optimization schemes; combine industry hardware evolution roadmaps with platform needs to drive optimal configuration selection and iteration.

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 600,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Responsibilities

  • Responsible for the stability and high availability of large model services, ensuring stable operation under high concurrency and large traffic; design and implement monitoring, alerting, and automated operations platforms to timely detect and resolve issues.
  • Responsible for quick fault location and repair, formulate and execute emergency plans to ensure business continuity; participate in post-mortem analysis, identify root causes, propose improvement measures to prevent recurrence.
  • Develop and maintain automated operations platforms and tools to improve efficiency and reduce human errors; optimize resource usage to increase utilization and enhance system performance.
  • Analyze and deeply explore deficiencies in existing systems, use data-driven approaches to identify weak points, and promote the implementation of optimizations and improvements.
  • Responsible for resource planning and management to ensure reasonable allocation and efficient utilization; conduct resource cost analysis, monitor and evaluate usage, propose cost optimization schemes; combine industry hardware evolution roadmaps with platform needs to drive optimal configuration selection and iteration.

Target Your Resume for "混元大模型SRE运维工程师(北京)" , Tencent

Get personalized recommendations to optimize your resume specifically for 混元大模型SRE运维工程师(北京). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "混元大模型SRE运维工程师(北京)" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaTEGTEG

Answer 10 quick questions to check your fit for 混元大模型SRE运维工程师(北京) @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Tencent logo

混元大模型SRE运维工程师(北京)

Tencent

Engineering Jobs

混元大模型SRE运维工程师(北京)

full-timePosted: Dec 9, 2025

Job Description

混元大模型SRE运维工程师(北京)

📋 Job Overview

The role of SRE Operations Engineer for the Hunyuan Large Model in Beijing focuses on ensuring the stability and high availability of large model services under high concurrency and traffic. Responsibilities include designing monitoring and automation platforms, rapid fault resolution, and resource optimization to enhance efficiency and performance. The position involves proactive system analysis, cost management, and driving continuous improvements aligned with industry hardware trends.

📍 Location: Shenzhen, China

🏢 Business Unit: TEG

📄 Full Description

1.负责大模型服务的稳定性和高可用性,确保平台在高并发和大流量下的稳定运行,设计和实施监控、报警和自动化运维平台建设等,及时发现和解决问题;
2.负责故障的快速定位和修复,制定并执行应急预案,确保业务连续性,参与故障复盘,分析根本原因,提出改进措施,防止类似问题再次发生;
3.开发和维护自动化运维平台与工具,提高运维效率,减少人为操作失误。进行资源使用优化,提高资源利用率,提升系统性能;
4.分析和深入发掘现有系统的不足,数据驱动找到薄弱点,推动系统优化落地改进;
5.负责资源规划和管理,确保资源的合理分配和高效利用,进行资源成本分析,监控和评估资源使用情况,提出成本优化方案,同时能结合业界硬件演进roadmap与技术平台需求不断推动最优配置选型与迭代。

🎯 Key Responsibilities

  • Responsible for the stability and high availability of large model services, ensuring stable operation under high concurrency and large traffic; design and implement monitoring, alerting, and automated operations platforms to timely detect and resolve issues.
  • Responsible for quick fault location and repair, formulate and execute emergency plans to ensure business continuity; participate in post-mortem analysis, identify root causes, propose improvement measures to prevent recurrence.
  • Develop and maintain automated operations platforms and tools to improve efficiency and reduce human errors; optimize resource usage to increase utilization and enhance system performance.
  • Analyze and deeply explore deficiencies in existing systems, use data-driven approaches to identify weak points, and promote the implementation of optimizations and improvements.
  • Responsible for resource planning and management to ensure reasonable allocation and efficient utilization; conduct resource cost analysis, monitor and evaluate usage, propose cost optimization schemes; combine industry hardware evolution roadmaps with platform needs to drive optimal configuration selection and iteration.

Locations

  • Shenzhen, China

Salary

Estimated Salary Rangemedium confidence

300,000 - 600,000 CNY / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Responsibilities

  • Responsible for the stability and high availability of large model services, ensuring stable operation under high concurrency and large traffic; design and implement monitoring, alerting, and automated operations platforms to timely detect and resolve issues.
  • Responsible for quick fault location and repair, formulate and execute emergency plans to ensure business continuity; participate in post-mortem analysis, identify root causes, propose improvement measures to prevent recurrence.
  • Develop and maintain automated operations platforms and tools to improve efficiency and reduce human errors; optimize resource usage to increase utilization and enhance system performance.
  • Analyze and deeply explore deficiencies in existing systems, use data-driven approaches to identify weak points, and promote the implementation of optimizations and improvements.
  • Responsible for resource planning and management to ensure reasonable allocation and efficient utilization; conduct resource cost analysis, monitor and evaluate usage, propose cost optimization schemes; combine industry hardware evolution roadmaps with platform needs to drive optimal configuration selection and iteration.

Target Your Resume for "混元大模型SRE运维工程师(北京)" , Tencent

Get personalized recommendations to optimize your resume specifically for 混元大模型SRE运维工程师(北京). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "混元大模型SRE运维工程师(北京)" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentShenzhenChinaTEGTEG

Answer 10 quick questions to check your fit for 混元大模型SRE运维工程师(北京) @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.