RESUME AND JOB

Senior Machine Learning Systems Engineer (Training Optimization)

Canva

Senior Machine Learning Systems Engineer (Training Optimization)

Canva

full-timePosted: Dec 16, 2025

Job Description

Senior Machine Learning Systems Engineer (Training Optimization)

Location: Team Engineering

Team: Country Beijing / China

About the Role

At Canva, we're revolutionizing how the world creates through AI-powered design tools used by 170+ million people monthly. Join our CORE team within the Generative AI supergroup in Beijing as a Senior Machine Learning Systems Engineer (Training Optimization). You'll lead efforts to scale and optimize training systems for our large-scale multimodal and foundation models that power smart editing, AI video tools, and the next generation of creative intelligence. Sitting at the intersection of systems engineering and AI research, you'll push performance boundaries across compute, memory, and communication—directly shaping Canva's innovative product roadmap in a collaborative, design-focused culture. Your role will involve designing distributed training infrastructure using cutting-edge frameworks like Megatron-LM, NVIDIA NeMo, FSDP, and Triton. You'll optimize every layer of the stack for maximum efficiency, develop custom CUDA/Triton kernels, and partner with research teams to translate algorithmic breakthroughs into production-scale systems. In our fast-paced, high-impact environment, you'll debug complex workflows, profile GPU utilization, and unlock new levels of scalability that enable faster iteration on features millions of creators rely on daily. We're seeking a systems-first engineer passionate about creative AI who thrives in global collaboration. With deep expertise in LLMs and distributed training, you'll bring innovative solutions to our Beijing team while embracing Canva's inclusive culture of experimentation and user obsession. Enjoy hybrid flexibility, comprehensive benefits, and the chance to build foundational technologies that make design accessible to everyone—join us in creating the future of visual communication.

Key Responsibilities

Design, implement, and optimize large-scale distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton
Improve GPU utilization, communication overhead, and memory efficiency for multimodal and foundation models
Partner closely with AI research and modeling teams to align systems with cutting-edge algorithmic needs
Evaluate and implement industry-leading best practices for scalable distributed training
Develop low-level optimizations including custom CUDA or Triton kernels
Debug, profile, and fine-tune training workflows to achieve breakthrough scalability
Collaborate globally within Canva's Generative AI supergroup to power creative intelligence features
Drive performance innovations that enable faster iteration on AI-assisted design tools
Mentor junior engineers and contribute to the technical roadmap for CORE team initiatives
Profile and optimize end-to-end ML pipelines from training to inference at massive scale

Required Qualifications

Strong background in LLMs, multimodal AI, or diffusion models
Proficiency in Python with familiarity in a systems programming language like C++ or Rust
Deep knowledge of PyTorch or JAX, and libraries such as Megatron-LM, NeMo, or DeepSpeed
Hands-on experience with distributed training optimization techniques including FSDP/ZeRO, gradient checkpointing, and low-precision data types
Experience writing custom GPU kernels in CUDA or Triton
Excellent communication and problem-solving skills with full proficiency in English
Proven track record of scaling ML training systems in production environments

Preferred Qualifications

Experience optimizing training for multimodal foundation models
Familiarity with NVIDIA Triton Inference Server for deployment
Background in designing systems for creative AI applications
Contributions to open-source ML frameworks or distributed systems
Experience collaborating across research, engineering, and product teams

Required Skills

Distributed ML training systems
PyTorch/JAX expertise
Megatron-LM/NVIDIA NeMo
FSDP/ZeRO optimization
CUDA/Triton kernel development
GPU performance profiling
Memory optimization techniques
Multimodal AI models
LLM training at scale
Cross-functional collaboration
Problem-solving in complex systems
English communication proficiency
Python systems programming
Low-precision training (FP16/BF16)
Gradient checkpointing
Cluster management for ML workloads
Performance debugging
AI research partnership

Benefits

Hybrid work model in Beijing with flexible hours to support global collaboration
Canva+ subscription including premium features for personal and team use
Comprehensive health insurance and wellness programs tailored for China
Generous parental leave and family support benefits
Learning stipend for conferences, courses, and certifications in AI/ML
Stock options and performance bonuses in a high-growth tech company
Weekly catered lunches, snacks, and team-building events
Visa sponsorship and relocation support for international talent
Mental health support and 24/7 employee assistance programs

Canva is an equal opportunity employer.

Locations

Team Engineering, Global

Salary

Estimated Salary Rangemedium confidence

120,000 - 220,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Distributed ML training systemsintermediate
PyTorch/JAX expertiseintermediate
Megatron-LM/NVIDIA NeMointermediate
FSDP/ZeRO optimizationintermediate
CUDA/Triton kernel developmentintermediate
GPU performance profilingintermediate
Memory optimization techniquesintermediate
Multimodal AI modelsintermediate
LLM training at scaleintermediate
Cross-functional collaborationintermediate
Problem-solving in complex systemsintermediate
English communication proficiencyintermediate
Python systems programmingintermediate
Low-precision training (FP16/BF16)intermediate
Gradient checkpointingintermediate
Cluster management for ML workloadsintermediate
Performance debuggingintermediate
AI research partnershipintermediate

Required Qualifications

Strong background in LLMs, multimodal AI, or diffusion models (experience)
Proficiency in Python with familiarity in a systems programming language like C++ or Rust (experience)
Deep knowledge of PyTorch or JAX, and libraries such as Megatron-LM, NeMo, or DeepSpeed (experience)
Hands-on experience with distributed training optimization techniques including FSDP/ZeRO, gradient checkpointing, and low-precision data types (experience)
Experience writing custom GPU kernels in CUDA or Triton (experience)
Excellent communication and problem-solving skills with full proficiency in English (experience)
Proven track record of scaling ML training systems in production environments (experience)

Preferred Qualifications

Experience optimizing training for multimodal foundation models (experience)
Familiarity with NVIDIA Triton Inference Server for deployment (experience)
Background in designing systems for creative AI applications (experience)
Contributions to open-source ML frameworks or distributed systems (experience)
Experience collaborating across research, engineering, and product teams (experience)

Responsibilities

Design, implement, and optimize large-scale distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton
Improve GPU utilization, communication overhead, and memory efficiency for multimodal and foundation models
Partner closely with AI research and modeling teams to align systems with cutting-edge algorithmic needs
Evaluate and implement industry-leading best practices for scalable distributed training
Develop low-level optimizations including custom CUDA or Triton kernels
Debug, profile, and fine-tune training workflows to achieve breakthrough scalability
Collaborate globally within Canva's Generative AI supergroup to power creative intelligence features
Drive performance innovations that enable faster iteration on AI-assisted design tools
Mentor junior engineers and contribute to the technical roadmap for CORE team initiatives
Profile and optimize end-to-end ML pipelines from training to inference at massive scale

Benefits

general: Hybrid work model in Beijing with flexible hours to support global collaboration
general: Canva+ subscription including premium features for personal and team use
general: Comprehensive health insurance and wellness programs tailored for China
general: Generous parental leave and family support benefits
general: Learning stipend for conferences, courses, and certifications in AI/ML
general: Stock options and performance bonuses in a high-growth tech company
general: Weekly catered lunches, snacks, and team-building events
general: Visa sponsorship and relocation support for international talent
general: Mental health support and 24/7 employee assistance programs

Target Your Resume for "Senior Machine Learning Systems Engineer (Training Optimization)" , Canva

Get personalized recommendations to optimize your resume specifically for Senior Machine Learning Systems Engineer (Training Optimization). Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Senior Machine Learning Systems Engineer (Training Optimization)" , Canva

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

CanvaDesignCountry Beijing / ChinaTeam EngineeringGlobalCountry Beijing / China

Answer 10 quick questions to check your fit for Senior Machine Learning Systems Engineer (Training Optimization) @ Canva.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Senior Machine Learning Systems Engineer (Training Optimization)

Canva

Senior Machine Learning Systems Engineer (Training Optimization)

Canva

full-timePosted: Dec 16, 2025

Job Description

Senior Machine Learning Systems Engineer (Training Optimization)

Location: Team Engineering

Team: Country Beijing / China

About the Role

Key Responsibilities

Design, implement, and optimize large-scale distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton
Improve GPU utilization, communication overhead, and memory efficiency for multimodal and foundation models
Partner closely with AI research and modeling teams to align systems with cutting-edge algorithmic needs
Evaluate and implement industry-leading best practices for scalable distributed training
Develop low-level optimizations including custom CUDA or Triton kernels
Debug, profile, and fine-tune training workflows to achieve breakthrough scalability
Collaborate globally within Canva's Generative AI supergroup to power creative intelligence features
Drive performance innovations that enable faster iteration on AI-assisted design tools
Mentor junior engineers and contribute to the technical roadmap for CORE team initiatives
Profile and optimize end-to-end ML pipelines from training to inference at massive scale

Required Qualifications

Strong background in LLMs, multimodal AI, or diffusion models
Proficiency in Python with familiarity in a systems programming language like C++ or Rust
Deep knowledge of PyTorch or JAX, and libraries such as Megatron-LM, NeMo, or DeepSpeed
Hands-on experience with distributed training optimization techniques including FSDP/ZeRO, gradient checkpointing, and low-precision data types
Experience writing custom GPU kernels in CUDA or Triton
Excellent communication and problem-solving skills with full proficiency in English
Proven track record of scaling ML training systems in production environments

Preferred Qualifications

Experience optimizing training for multimodal foundation models
Familiarity with NVIDIA Triton Inference Server for deployment
Background in designing systems for creative AI applications
Contributions to open-source ML frameworks or distributed systems
Experience collaborating across research, engineering, and product teams

Required Skills

Distributed ML training systems
PyTorch/JAX expertise
Megatron-LM/NVIDIA NeMo
FSDP/ZeRO optimization
CUDA/Triton kernel development
GPU performance profiling
Memory optimization techniques
Multimodal AI models
LLM training at scale
Cross-functional collaboration
Problem-solving in complex systems
English communication proficiency
Python systems programming
Low-precision training (FP16/BF16)
Gradient checkpointing
Cluster management for ML workloads
Performance debugging
AI research partnership

Benefits

Hybrid work model in Beijing with flexible hours to support global collaboration
Canva+ subscription including premium features for personal and team use
Comprehensive health insurance and wellness programs tailored for China
Generous parental leave and family support benefits
Learning stipend for conferences, courses, and certifications in AI/ML
Stock options and performance bonuses in a high-growth tech company
Weekly catered lunches, snacks, and team-building events
Visa sponsorship and relocation support for international talent
Mental health support and 24/7 employee assistance programs

Canva is an equal opportunity employer.

Locations

Team Engineering, Global

Salary

Estimated Salary Rangemedium confidence

120,000 - 220,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Distributed ML training systemsintermediate
PyTorch/JAX expertiseintermediate
Megatron-LM/NVIDIA NeMointermediate
FSDP/ZeRO optimizationintermediate
CUDA/Triton kernel developmentintermediate
GPU performance profilingintermediate
Memory optimization techniquesintermediate
Multimodal AI modelsintermediate
LLM training at scaleintermediate
Cross-functional collaborationintermediate
Problem-solving in complex systemsintermediate
English communication proficiencyintermediate
Python systems programmingintermediate
Low-precision training (FP16/BF16)intermediate
Gradient checkpointingintermediate
Cluster management for ML workloadsintermediate
Performance debuggingintermediate
AI research partnershipintermediate

Required Qualifications

Strong background in LLMs, multimodal AI, or diffusion models (experience)
Proficiency in Python with familiarity in a systems programming language like C++ or Rust (experience)
Deep knowledge of PyTorch or JAX, and libraries such as Megatron-LM, NeMo, or DeepSpeed (experience)
Hands-on experience with distributed training optimization techniques including FSDP/ZeRO, gradient checkpointing, and low-precision data types (experience)
Experience writing custom GPU kernels in CUDA or Triton (experience)
Excellent communication and problem-solving skills with full proficiency in English (experience)
Proven track record of scaling ML training systems in production environments (experience)

Preferred Qualifications

Experience optimizing training for multimodal foundation models (experience)
Familiarity with NVIDIA Triton Inference Server for deployment (experience)
Background in designing systems for creative AI applications (experience)
Contributions to open-source ML frameworks or distributed systems (experience)
Experience collaborating across research, engineering, and product teams (experience)

Responsibilities

Design, implement, and optimize large-scale distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton
Improve GPU utilization, communication overhead, and memory efficiency for multimodal and foundation models
Partner closely with AI research and modeling teams to align systems with cutting-edge algorithmic needs
Evaluate and implement industry-leading best practices for scalable distributed training
Develop low-level optimizations including custom CUDA or Triton kernels
Debug, profile, and fine-tune training workflows to achieve breakthrough scalability
Collaborate globally within Canva's Generative AI supergroup to power creative intelligence features
Drive performance innovations that enable faster iteration on AI-assisted design tools
Mentor junior engineers and contribute to the technical roadmap for CORE team initiatives
Profile and optimize end-to-end ML pipelines from training to inference at massive scale

Benefits

general: Hybrid work model in Beijing with flexible hours to support global collaboration
general: Canva+ subscription including premium features for personal and team use
general: Comprehensive health insurance and wellness programs tailored for China
general: Generous parental leave and family support benefits
general: Learning stipend for conferences, courses, and certifications in AI/ML
general: Stock options and performance bonuses in a high-growth tech company
general: Weekly catered lunches, snacks, and team-building events
general: Visa sponsorship and relocation support for international talent
general: Mental health support and 24/7 employee assistance programs

Target Your Resume for "Senior Machine Learning Systems Engineer (Training Optimization)" , Canva

Get personalized recommendations to optimize your resume specifically for Senior Machine Learning Systems Engineer (Training Optimization). Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Senior Machine Learning Systems Engineer (Training Optimization)" , Canva

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

CanvaDesignCountry Beijing / ChinaTeam EngineeringGlobalCountry Beijing / China

Answer 10 quick questions to check your fit for Senior Machine Learning Systems Engineer (Training Optimization) @ Canva.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap