Resume and JobRESUME AND JOB
Tencent logo

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

Tencent

Software and Technology Jobs

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

internshipPosted: Oct 12, 2025

Job Description

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

📋 Job Overview

Tencent is seeking a Vision Researcher specializing in Multimodal Understanding & Generation for Foundation Models to drive innovative research in computer vision and multimodal AI. The role involves collaborating on native multimodal foundation models, exploring large-scale training for world representations, and contributing to open-source and product teams. This position is based in Bellevue, Washington, and requires expertise in cutting-edge AI technologies.

📍 Location: Bellevue, Washington, United States

🏢 Business Unit: TEG

📄 Full Description

Business Unit

What the Role Entails
Responsibilities:
1. Serve as a domain expert in computer vision and collaborate with researchers from other modalities to drive cutting-edge research in native multimodal foundation models, including novel architecture design and modeling for “2D + time” and “3D + time” scenarios.
2. Explore the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning.
3. Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops, and engage with leading global research teams.
4. Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams.
 

Who We Look For
Qualifications:
1. Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field.
2. Proven multi-modal research experience in relevant areas, with familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML.
3. Proficiency with mainstream open-source tools and frameworks relevant to the field, and strong engineering skills to support research implementation; candidates with influential GitHub projects or contributions to high-impact open-source communities are preferred.
4. Strong team spirit and ability to collaborate across disciplines, excellent communication skills, intellectual curiosity, and a goal-oriented, problem-solving mindset.
Location State(s)
US-Washington-Bellevue
The expected base pay range for this position in the location(s) listed above is $122,500.00 to $229,700.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience.
Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis.
Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year.
Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.

Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Work Location: US-Washington-Bellevue

🎯 Key Responsibilities

  • Serve as a domain expert in computer vision and collaborate with researchers from other modalities to drive cutting-edge research in native multimodal foundation models, including novel architecture design and modeling for “2D + time” and “3D + time” scenarios.
  • Explore the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning.
  • Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops, and engage with leading global research teams.
  • Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams.

✅ Required Qualifications

  • Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field.
  • Proven multi-modal research experience in relevant areas, with familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML.
  • Proficiency with mainstream open-source tools and frameworks relevant to the field, and strong engineering skills to support research implementation.

⭐ Preferred Qualifications

  • Candidates with influential GitHub projects or contributions to high-impact open-source communities.

🛠️ Required Skills

  • Proficiency with mainstream open-source tools and frameworks relevant to computer vision and machine learning.
  • Strong engineering skills for research implementation.
  • Strong team spirit and ability to collaborate across disciplines.
  • Excellent communication skills.
  • Intellectual curiosity.
  • Goal-oriented, problem-solving mindset.

🎁 Benefits

  • Medical, dental, vision, life and disability benefits.
  • Participation in the Company’s 401(k) plan.
  • Up to 15 to 25 days of vacation per year (depending on tenure).
  • Up to 13 days of holidays throughout the calendar year.
  • Up to 10 days of paid sick leave per year.
  • Eligibility for sign-on payment, relocation package, and restricted stock units (evaluated case-by-case).

Locations

  • Bellevue, Washington, United States

Salary

122,500 - 229,700 USD / yearly

Skills Required

  • Proficiency with mainstream open-source tools and frameworks relevant to computer vision and machine learning.intermediate
  • Strong engineering skills for research implementation.intermediate
  • Strong team spirit and ability to collaborate across disciplines.intermediate
  • Excellent communication skills.intermediate
  • Intellectual curiosity.intermediate
  • Goal-oriented, problem-solving mindset.intermediate

Required Qualifications

  • Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field. (experience)
  • Proven multi-modal research experience in relevant areas, with familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML. (experience)
  • Proficiency with mainstream open-source tools and frameworks relevant to the field, and strong engineering skills to support research implementation. (experience)

Preferred Qualifications

  • Candidates with influential GitHub projects or contributions to high-impact open-source communities. (experience)

Responsibilities

  • Serve as a domain expert in computer vision and collaborate with researchers from other modalities to drive cutting-edge research in native multimodal foundation models, including novel architecture design and modeling for “2D + time” and “3D + time” scenarios.
  • Explore the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning.
  • Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops, and engage with leading global research teams.
  • Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams.

Benefits

  • general: Medical, dental, vision, life and disability benefits.
  • general: Participation in the Company’s 401(k) plan.
  • general: Up to 15 to 25 days of vacation per year (depending on tenure).
  • general: Up to 13 days of holidays throughout the calendar year.
  • general: Up to 10 days of paid sick leave per year.
  • general: Eligibility for sign-on payment, relocation package, and restricted stock units (evaluated case-by-case).

Target Your Resume for "Vision Researcher – Multimodal Understanding & Generation in Foundation Models" , Tencent

Get personalized recommendations to optimize your resume specifically for Vision Researcher – Multimodal Understanding & Generation in Foundation Models. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Vision Researcher – Multimodal Understanding & Generation in Foundation Models" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentBellevueUnited StatesTEGTEG

Answer 10 quick questions to check your fit for Vision Researcher – Multimodal Understanding & Generation in Foundation Models @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Tencent logo

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

Tencent

Software and Technology Jobs

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

internshipPosted: Oct 12, 2025

Job Description

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

📋 Job Overview

Tencent is seeking a Vision Researcher specializing in Multimodal Understanding & Generation for Foundation Models to drive innovative research in computer vision and multimodal AI. The role involves collaborating on native multimodal foundation models, exploring large-scale training for world representations, and contributing to open-source and product teams. This position is based in Bellevue, Washington, and requires expertise in cutting-edge AI technologies.

📍 Location: Bellevue, Washington, United States

🏢 Business Unit: TEG

📄 Full Description

Business Unit

What the Role Entails
Responsibilities:
1. Serve as a domain expert in computer vision and collaborate with researchers from other modalities to drive cutting-edge research in native multimodal foundation models, including novel architecture design and modeling for “2D + time” and “3D + time” scenarios.
2. Explore the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning.
3. Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops, and engage with leading global research teams.
4. Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams.
 

Who We Look For
Qualifications:
1. Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field.
2. Proven multi-modal research experience in relevant areas, with familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML.
3. Proficiency with mainstream open-source tools and frameworks relevant to the field, and strong engineering skills to support research implementation; candidates with influential GitHub projects or contributions to high-impact open-source communities are preferred.
4. Strong team spirit and ability to collaborate across disciplines, excellent communication skills, intellectual curiosity, and a goal-oriented, problem-solving mindset.
Location State(s)
US-Washington-Bellevue
The expected base pay range for this position in the location(s) listed above is $122,500.00 to $229,700.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience.
Employees hired for this position may be eligible for a sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis.
Subject to the terms and conditions of the plans in effect, hired applicants are also eligible for medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan. The Employee is also eligible for up to 15 to 25 days of vacation per year (depending on the employee’s tenure), up to 13 days of holidays throughout the calendar year, and up to 10 days of paid sick leave per year.
Your benefits may be adjusted to reflect your location, employment status, duration of employment with the company, and position level. Benefits may also be pro-rated for those who start working during the calendar year.

Equal Employment Opportunity at Tencent
As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.
Work Location: US-Washington-Bellevue

🎯 Key Responsibilities

  • Serve as a domain expert in computer vision and collaborate with researchers from other modalities to drive cutting-edge research in native multimodal foundation models, including novel architecture design and modeling for “2D + time” and “3D + time” scenarios.
  • Explore the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning.
  • Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops, and engage with leading global research teams.
  • Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams.

✅ Required Qualifications

  • Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field.
  • Proven multi-modal research experience in relevant areas, with familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML.
  • Proficiency with mainstream open-source tools and frameworks relevant to the field, and strong engineering skills to support research implementation.

⭐ Preferred Qualifications

  • Candidates with influential GitHub projects or contributions to high-impact open-source communities.

🛠️ Required Skills

  • Proficiency with mainstream open-source tools and frameworks relevant to computer vision and machine learning.
  • Strong engineering skills for research implementation.
  • Strong team spirit and ability to collaborate across disciplines.
  • Excellent communication skills.
  • Intellectual curiosity.
  • Goal-oriented, problem-solving mindset.

🎁 Benefits

  • Medical, dental, vision, life and disability benefits.
  • Participation in the Company’s 401(k) plan.
  • Up to 15 to 25 days of vacation per year (depending on tenure).
  • Up to 13 days of holidays throughout the calendar year.
  • Up to 10 days of paid sick leave per year.
  • Eligibility for sign-on payment, relocation package, and restricted stock units (evaluated case-by-case).

Locations

  • Bellevue, Washington, United States

Salary

122,500 - 229,700 USD / yearly

Skills Required

  • Proficiency with mainstream open-source tools and frameworks relevant to computer vision and machine learning.intermediate
  • Strong engineering skills for research implementation.intermediate
  • Strong team spirit and ability to collaborate across disciplines.intermediate
  • Excellent communication skills.intermediate
  • Intellectual curiosity.intermediate
  • Goal-oriented, problem-solving mindset.intermediate

Required Qualifications

  • Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field. (experience)
  • Proven multi-modal research experience in relevant areas, with familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML. (experience)
  • Proficiency with mainstream open-source tools and frameworks relevant to the field, and strong engineering skills to support research implementation. (experience)

Preferred Qualifications

  • Candidates with influential GitHub projects or contributions to high-impact open-source communities. (experience)

Responsibilities

  • Serve as a domain expert in computer vision and collaborate with researchers from other modalities to drive cutting-edge research in native multimodal foundation models, including novel architecture design and modeling for “2D + time” and “3D + time” scenarios.
  • Explore the training and design of large models for understanding and generating representations of the physical world, multimodal reasoning, and self-evolving continual learning.
  • Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops, and engage with leading global research teams.
  • Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams.

Benefits

  • general: Medical, dental, vision, life and disability benefits.
  • general: Participation in the Company’s 401(k) plan.
  • general: Up to 15 to 25 days of vacation per year (depending on tenure).
  • general: Up to 13 days of holidays throughout the calendar year.
  • general: Up to 10 days of paid sick leave per year.
  • general: Eligibility for sign-on payment, relocation package, and restricted stock units (evaluated case-by-case).

Target Your Resume for "Vision Researcher – Multimodal Understanding & Generation in Foundation Models" , Tencent

Get personalized recommendations to optimize your resume specifically for Vision Researcher – Multimodal Understanding & Generation in Foundation Models. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Vision Researcher – Multimodal Understanding & Generation in Foundation Models" , Tencent

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

TencentBellevueUnited StatesTEGTEG

Answer 10 quick questions to check your fit for Vision Researcher – Multimodal Understanding & Generation in Foundation Models @ Tencent.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.