Resume and JobRESUME AND JOB
Tencent logo

Research Scientist - Speech & Audio Understanding (Speech Generation)

Tencent

Software and Technology Jobs

Research Scientist - Speech & Audio Understanding (Speech Generation)

full-timePosted: Nov 5, 2025

Job Description

Research Scientist - Speech & Audio Understanding (Speech Generation)

📋 Job Overview

Tencent is seeking a Research Scientist specializing in Speech & Audio Understanding with a focus on Speech Generation to advance voice technologies in Bellevue, Washington. The role involves tracking cutting-edge research in speech generation, exploring multimodal foundation models integrating text, speech, and vision, and leading R&D to enhance model performance and applications. This position offers a competitive salary range of $122,500 to $229,700, along with comprehensive benefits and opportunities for innovation in a diverse, supportive environment.

📍 Location: Bellevue, Washington, United States

🏢 Business Unit: TEG

📄 Full Description

Business Unit

What the Role Entails
Job Responsibilities:
1. Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities.  
2. Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision.  
3. Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications.  

Who We Look For
Job Requirements:
1. Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields.  
2. Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec.  
3. Familiarity with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila). Prior project experience is preferred.  
4. Proficient in deep learning frameworks (e.g., PyTorch). Experience with large-scale model training frameworks (Megatron/Deepspeed) is a plus.  
5. Solid understanding of large model architectures and principles. Experience in large-scale pretraining or post-training is preferred.  
Location State(s)
US-Washington-Bellevue
The expected base pay range for this position in the location(s) listed above is $122,500.00 to $229,700.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience.
Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis.
Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year.
Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.

Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Work Location: US-Washington-Bellevue

🎯 Key Responsibilities

  • Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities.
  • Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision.
  • Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications.

✅ Required Qualifications

  • Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields.
  • Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec.
  • Familiarity with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila).

⭐ Preferred Qualifications

  • Prior project experience with mainstream voice-enabled large models.
  • Experience with large-scale model training frameworks (Megatron/Deepspeed).
  • Experience in large-scale pretraining or post-training.

🛠️ Required Skills

  • Proficient in deep learning frameworks (e.g., PyTorch).
  • Solid understanding of large model architectures and principles.

🎁 Benefits

  • Competitive base pay range of $122,500 to $229,700 per year.
  • Eligibility for sign-on payment, relocation package, and restricted stock units (case-by-case).
  • Medical, dental, vision, life, and disability benefits.
  • Participation in the Company’s 401(k) plan.
  • 15 to 25 days of vacation per year (depending on tenure).
  • Up to 13 days of holidays per year.
  • Up to 10 days of paid sick leave per year.

Locations

  • Bellevue, Washington, United States

Salary

122,500 - 229,700 USD / yearly

Skills Required

  • Proficient in deep learning frameworks (e.g., PyTorch).intermediate
  • Solid understanding of large model architectures and principles.intermediate

Required Qualifications

  • Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields. (experience)
  • Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec. (experience)
  • Familiarity with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila). (experience)

Preferred Qualifications

  • Prior project experience with mainstream voice-enabled large models. (experience)
  • Experience with large-scale model training frameworks (Megatron/Deepspeed). (experience)
  • Experience in large-scale pretraining or post-training. (experience)

Responsibilities

  • Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities.
  • Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision.
  • Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications.

Benefits

  • general: Competitive base pay range of $122,500 to $229,700 per year.
  • general: Eligibility for sign-on payment, relocation package, and restricted stock units (case-by-case).
  • general: Medical, dental, vision, life, and disability benefits.
  • general: Participation in the Company’s 401(k) plan.
  • general: 15 to 25 days of vacation per year (depending on tenure).
  • general: Up to 13 days of holidays per year.
  • general: Up to 10 days of paid sick leave per year.

Target Your Resume for "Research Scientist - Speech & Audio Understanding (Speech Generation)" , Tencent

Get personalized recommendations to optimize your resume specifically for Research Scientist - Speech & Audio Understanding (Speech Generation). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Research Scientist - Speech & Audio Understanding (Speech Generation)" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentBellevueUnited StatesTEGTEG

Answer 10 quick questions to check your fit for Research Scientist - Speech & Audio Understanding (Speech Generation) @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Tencent logo

Research Scientist - Speech & Audio Understanding (Speech Generation)

Tencent

Software and Technology Jobs

Research Scientist - Speech & Audio Understanding (Speech Generation)

full-timePosted: Nov 5, 2025

Job Description

Research Scientist - Speech & Audio Understanding (Speech Generation)

📋 Job Overview

Tencent is seeking a Research Scientist specializing in Speech & Audio Understanding with a focus on Speech Generation to advance voice technologies in Bellevue, Washington. The role involves tracking cutting-edge research in speech generation, exploring multimodal foundation models integrating text, speech, and vision, and leading R&D to enhance model performance and applications. This position offers a competitive salary range of $122,500 to $229,700, along with comprehensive benefits and opportunities for innovation in a diverse, supportive environment.

📍 Location: Bellevue, Washington, United States

🏢 Business Unit: TEG

📄 Full Description

Business Unit

What the Role Entails
Job Responsibilities:
1. Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities.  
2. Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision.  
3. Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications.  

Who We Look For
Job Requirements:
1. Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields.  
2. Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec.  
3. Familiarity with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila). Prior project experience is preferred.  
4. Proficient in deep learning frameworks (e.g., PyTorch). Experience with large-scale model training frameworks (Megatron/Deepspeed) is a plus.  
5. Solid understanding of large model architectures and principles. Experience in large-scale pretraining or post-training is preferred.  
Location State(s)
US-Washington-Bellevue
The expected base pay range for this position in the location(s) listed above is $122,500.00 to $229,700.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience.
Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis.
Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year.
Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.

Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Work Location: US-Washington-Bellevue

🎯 Key Responsibilities

  • Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities.
  • Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision.
  • Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications.

✅ Required Qualifications

  • Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields.
  • Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec.
  • Familiarity with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila).

⭐ Preferred Qualifications

  • Prior project experience with mainstream voice-enabled large models.
  • Experience with large-scale model training frameworks (Megatron/Deepspeed).
  • Experience in large-scale pretraining or post-training.

🛠️ Required Skills

  • Proficient in deep learning frameworks (e.g., PyTorch).
  • Solid understanding of large model architectures and principles.

🎁 Benefits

  • Competitive base pay range of $122,500 to $229,700 per year.
  • Eligibility for sign-on payment, relocation package, and restricted stock units (case-by-case).
  • Medical, dental, vision, life, and disability benefits.
  • Participation in the Company’s 401(k) plan.
  • 15 to 25 days of vacation per year (depending on tenure).
  • Up to 13 days of holidays per year.
  • Up to 10 days of paid sick leave per year.

Locations

  • Bellevue, Washington, United States

Salary

122,500 - 229,700 USD / yearly

Skills Required

  • Proficient in deep learning frameworks (e.g., PyTorch).intermediate
  • Solid understanding of large model architectures and principles.intermediate

Required Qualifications

  • Master’s or Ph.D. in Computer Science, Artificial Intelligence, Electronic Engineering, Signal Processing, or related fields. (experience)
  • Research or development experience in one or more areas: voice foundation models, speech synthesis, speech recognition, audio generation, voice conversion, or speech codec. (experience)
  • Familiarity with mainstream voice-enabled large models (e.g., GPT4o, GLM-4-Voice, Qwen2.5-Omni, Voila). (experience)

Preferred Qualifications

  • Prior project experience with mainstream voice-enabled large models. (experience)
  • Experience with large-scale model training frameworks (Megatron/Deepspeed). (experience)
  • Experience in large-scale pretraining or post-training. (experience)

Responsibilities

  • Track the latest research in speech generation algorithms, explore next-generation paradigms for speech/audio generation, and push the boundaries of speech generation capabilities.
  • Investigate cutting-edge multimodal voice foundation model technologies to enhance voice interaction experiences by integrating text, speech, and vision.
  • Lead the technical R&D of voice foundation models, driving model performance improvements and innovative applications.

Benefits

  • general: Competitive base pay range of $122,500 to $229,700 per year.
  • general: Eligibility for sign-on payment, relocation package, and restricted stock units (case-by-case).
  • general: Medical, dental, vision, life, and disability benefits.
  • general: Participation in the Company’s 401(k) plan.
  • general: 15 to 25 days of vacation per year (depending on tenure).
  • general: Up to 13 days of holidays per year.
  • general: Up to 10 days of paid sick leave per year.

Target Your Resume for "Research Scientist - Speech & Audio Understanding (Speech Generation)" , Tencent

Get personalized recommendations to optimize your resume specifically for Research Scientist - Speech & Audio Understanding (Speech Generation). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Research Scientist - Speech & Audio Understanding (Speech Generation)" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentBellevueUnited StatesTEGTEG

Answer 10 quick questions to check your fit for Research Scientist - Speech & Audio Understanding (Speech Generation) @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.