Resume and JobRESUME AND JOB
Canva logo

Senior Machine Learning Systems Engineer (Training Optimization)

Canva

Senior Machine Learning Systems Engineer (Training Optimization)

Canva logo

Canva

full-time

Posted: December 16, 2025

Number of Vacancies: 1

Job Description

Senior Machine Learning Systems Engineer (Training Optimization)

Location: Team Engineering

Team: Country Beijing / China

About the Role

At Canva, we're revolutionizing how the world creates through AI-powered design tools used by 170+ million people monthly. Join our CORE team within the Generative AI supergroup in Beijing as a Senior Machine Learning Systems Engineer (Training Optimization). You'll lead efforts to scale and optimize training systems for our large-scale multimodal and foundation models that power smart editing, AI video tools, and the next generation of creative intelligence. Sitting at the intersection of systems engineering and AI research, you'll push performance boundaries across compute, memory, and communication—directly shaping Canva's innovative product roadmap in a collaborative, design-focused culture. Your role will involve designing distributed training infrastructure using cutting-edge frameworks like Megatron-LM, NVIDIA NeMo, FSDP, and Triton. You'll optimize every layer of the stack for maximum efficiency, develop custom CUDA/Triton kernels, and partner with research teams to translate algorithmic breakthroughs into production-scale systems. In our fast-paced, high-impact environment, you'll debug complex workflows, profile GPU utilization, and unlock new levels of scalability that enable faster iteration on features millions of creators rely on daily. We're seeking a systems-first engineer passionate about creative AI who thrives in global collaboration. With deep expertise in LLMs and distributed training, you'll bring innovative solutions to our Beijing team while embracing Canva's inclusive culture of experimentation and user obsession. Enjoy hybrid flexibility, comprehensive benefits, and the chance to build foundational technologies that make design accessible to everyone—join us in creating the future of visual communication.

Key Responsibilities

  • Design, implement, and optimize large-scale distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton
  • Improve GPU utilization, communication overhead, and memory efficiency for multimodal and foundation models
  • Partner closely with AI research and modeling teams to align systems with cutting-edge algorithmic needs
  • Evaluate and implement industry-leading best practices for scalable distributed training
  • Develop low-level optimizations including custom CUDA or Triton kernels
  • Debug, profile, and fine-tune training workflows to achieve breakthrough scalability
  • Collaborate globally within Canva's Generative AI supergroup to power creative intelligence features
  • Drive performance innovations that enable faster iteration on AI-assisted design tools
  • Mentor junior engineers and contribute to the technical roadmap for CORE team initiatives
  • Profile and optimize end-to-end ML pipelines from training to inference at massive scale

Required Qualifications

  • Strong background in LLMs, multimodal AI, or diffusion models
  • Proficiency in Python with familiarity in a systems programming language like C++ or Rust
  • Deep knowledge of PyTorch or JAX, and libraries such as Megatron-LM, NeMo, or DeepSpeed
  • Hands-on experience with distributed training optimization techniques including FSDP/ZeRO, gradient checkpointing, and low-precision data types
  • Experience writing custom GPU kernels in CUDA or Triton
  • Excellent communication and problem-solving skills with full proficiency in English
  • Proven track record of scaling ML training systems in production environments

Preferred Qualifications

  • Experience optimizing training for multimodal foundation models
  • Familiarity with NVIDIA Triton Inference Server for deployment
  • Background in designing systems for creative AI applications
  • Contributions to open-source ML frameworks or distributed systems
  • Experience collaborating across research, engineering, and product teams

Required Skills

  • Distributed ML training systems
  • PyTorch/JAX expertise
  • Megatron-LM/NVIDIA NeMo
  • FSDP/ZeRO optimization
  • CUDA/Triton kernel development
  • GPU performance profiling
  • Memory optimization techniques
  • Multimodal AI models
  • LLM training at scale
  • Cross-functional collaboration
  • Problem-solving in complex systems
  • English communication proficiency
  • Python systems programming
  • Low-precision training (FP16/BF16)
  • Gradient checkpointing
  • Cluster management for ML workloads
  • Performance debugging
  • AI research partnership

Benefits

  • Hybrid work model in Beijing with flexible hours to support global collaboration
  • Canva+ subscription including premium features for personal and team use
  • Comprehensive health insurance and wellness programs tailored for China
  • Generous parental leave and family support benefits
  • Learning stipend for conferences, courses, and certifications in AI/ML
  • Stock options and performance bonuses in a high-growth tech company
  • Weekly catered lunches, snacks, and team-building events
  • Visa sponsorship and relocation support for international talent
  • Mental health support and 24/7 employee assistance programs

Canva is an equal opportunity employer.

Locations

  • Team Engineering, Global

Salary

Estimated Salary Rangemedium confidence

120,000 - 220,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Distributed ML training systemsintermediate
  • PyTorch/JAX expertiseintermediate
  • Megatron-LM/NVIDIA NeMointermediate
  • FSDP/ZeRO optimizationintermediate
  • CUDA/Triton kernel developmentintermediate
  • GPU performance profilingintermediate
  • Memory optimization techniquesintermediate
  • Multimodal AI modelsintermediate
  • LLM training at scaleintermediate
  • Cross-functional collaborationintermediate
  • Problem-solving in complex systemsintermediate
  • English communication proficiencyintermediate
  • Python systems programmingintermediate
  • Low-precision training (FP16/BF16)intermediate
  • Gradient checkpointingintermediate
  • Cluster management for ML workloadsintermediate
  • Performance debuggingintermediate
  • AI research partnershipintermediate

Required Qualifications

  • Strong background in LLMs, multimodal AI, or diffusion models (experience)
  • Proficiency in Python with familiarity in a systems programming language like C++ or Rust (experience)
  • Deep knowledge of PyTorch or JAX, and libraries such as Megatron-LM, NeMo, or DeepSpeed (experience)
  • Hands-on experience with distributed training optimization techniques including FSDP/ZeRO, gradient checkpointing, and low-precision data types (experience)
  • Experience writing custom GPU kernels in CUDA or Triton (experience)
  • Excellent communication and problem-solving skills with full proficiency in English (experience)
  • Proven track record of scaling ML training systems in production environments (experience)

Preferred Qualifications

  • Experience optimizing training for multimodal foundation models (experience)
  • Familiarity with NVIDIA Triton Inference Server for deployment (experience)
  • Background in designing systems for creative AI applications (experience)
  • Contributions to open-source ML frameworks or distributed systems (experience)
  • Experience collaborating across research, engineering, and product teams (experience)

Responsibilities

  • Design, implement, and optimize large-scale distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton
  • Improve GPU utilization, communication overhead, and memory efficiency for multimodal and foundation models
  • Partner closely with AI research and modeling teams to align systems with cutting-edge algorithmic needs
  • Evaluate and implement industry-leading best practices for scalable distributed training
  • Develop low-level optimizations including custom CUDA or Triton kernels
  • Debug, profile, and fine-tune training workflows to achieve breakthrough scalability
  • Collaborate globally within Canva's Generative AI supergroup to power creative intelligence features
  • Drive performance innovations that enable faster iteration on AI-assisted design tools
  • Mentor junior engineers and contribute to the technical roadmap for CORE team initiatives
  • Profile and optimize end-to-end ML pipelines from training to inference at massive scale

Benefits

  • general: Hybrid work model in Beijing with flexible hours to support global collaboration
  • general: Canva+ subscription including premium features for personal and team use
  • general: Comprehensive health insurance and wellness programs tailored for China
  • general: Generous parental leave and family support benefits
  • general: Learning stipend for conferences, courses, and certifications in AI/ML
  • general: Stock options and performance bonuses in a high-growth tech company
  • general: Weekly catered lunches, snacks, and team-building events
  • general: Visa sponsorship and relocation support for international talent
  • general: Mental health support and 24/7 employee assistance programs

Target Your Resume for "Senior Machine Learning Systems Engineer (Training Optimization)" , Canva

Get personalized recommendations to optimize your resume specifically for Senior Machine Learning Systems Engineer (Training Optimization). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Senior Machine Learning Systems Engineer (Training Optimization)" , Canva

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

CanvaDesignCountry Beijing / ChinaTeam EngineeringGlobalCountry Beijing / China

Related Jobs You May Like

No related jobs found at the moment.

Canva logo

Senior Machine Learning Systems Engineer (Training Optimization)

Canva

Senior Machine Learning Systems Engineer (Training Optimization)

Canva logo

Canva

full-time

Posted: December 16, 2025

Number of Vacancies: 1

Job Description

Senior Machine Learning Systems Engineer (Training Optimization)

Location: Team Engineering

Team: Country Beijing / China

About the Role

At Canva, we're revolutionizing how the world creates through AI-powered design tools used by 170+ million people monthly. Join our CORE team within the Generative AI supergroup in Beijing as a Senior Machine Learning Systems Engineer (Training Optimization). You'll lead efforts to scale and optimize training systems for our large-scale multimodal and foundation models that power smart editing, AI video tools, and the next generation of creative intelligence. Sitting at the intersection of systems engineering and AI research, you'll push performance boundaries across compute, memory, and communication—directly shaping Canva's innovative product roadmap in a collaborative, design-focused culture. Your role will involve designing distributed training infrastructure using cutting-edge frameworks like Megatron-LM, NVIDIA NeMo, FSDP, and Triton. You'll optimize every layer of the stack for maximum efficiency, develop custom CUDA/Triton kernels, and partner with research teams to translate algorithmic breakthroughs into production-scale systems. In our fast-paced, high-impact environment, you'll debug complex workflows, profile GPU utilization, and unlock new levels of scalability that enable faster iteration on features millions of creators rely on daily. We're seeking a systems-first engineer passionate about creative AI who thrives in global collaboration. With deep expertise in LLMs and distributed training, you'll bring innovative solutions to our Beijing team while embracing Canva's inclusive culture of experimentation and user obsession. Enjoy hybrid flexibility, comprehensive benefits, and the chance to build foundational technologies that make design accessible to everyone—join us in creating the future of visual communication.

Key Responsibilities

  • Design, implement, and optimize large-scale distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton
  • Improve GPU utilization, communication overhead, and memory efficiency for multimodal and foundation models
  • Partner closely with AI research and modeling teams to align systems with cutting-edge algorithmic needs
  • Evaluate and implement industry-leading best practices for scalable distributed training
  • Develop low-level optimizations including custom CUDA or Triton kernels
  • Debug, profile, and fine-tune training workflows to achieve breakthrough scalability
  • Collaborate globally within Canva's Generative AI supergroup to power creative intelligence features
  • Drive performance innovations that enable faster iteration on AI-assisted design tools
  • Mentor junior engineers and contribute to the technical roadmap for CORE team initiatives
  • Profile and optimize end-to-end ML pipelines from training to inference at massive scale

Required Qualifications

  • Strong background in LLMs, multimodal AI, or diffusion models
  • Proficiency in Python with familiarity in a systems programming language like C++ or Rust
  • Deep knowledge of PyTorch or JAX, and libraries such as Megatron-LM, NeMo, or DeepSpeed
  • Hands-on experience with distributed training optimization techniques including FSDP/ZeRO, gradient checkpointing, and low-precision data types
  • Experience writing custom GPU kernels in CUDA or Triton
  • Excellent communication and problem-solving skills with full proficiency in English
  • Proven track record of scaling ML training systems in production environments

Preferred Qualifications

  • Experience optimizing training for multimodal foundation models
  • Familiarity with NVIDIA Triton Inference Server for deployment
  • Background in designing systems for creative AI applications
  • Contributions to open-source ML frameworks or distributed systems
  • Experience collaborating across research, engineering, and product teams

Required Skills

  • Distributed ML training systems
  • PyTorch/JAX expertise
  • Megatron-LM/NVIDIA NeMo
  • FSDP/ZeRO optimization
  • CUDA/Triton kernel development
  • GPU performance profiling
  • Memory optimization techniques
  • Multimodal AI models
  • LLM training at scale
  • Cross-functional collaboration
  • Problem-solving in complex systems
  • English communication proficiency
  • Python systems programming
  • Low-precision training (FP16/BF16)
  • Gradient checkpointing
  • Cluster management for ML workloads
  • Performance debugging
  • AI research partnership

Benefits

  • Hybrid work model in Beijing with flexible hours to support global collaboration
  • Canva+ subscription including premium features for personal and team use
  • Comprehensive health insurance and wellness programs tailored for China
  • Generous parental leave and family support benefits
  • Learning stipend for conferences, courses, and certifications in AI/ML
  • Stock options and performance bonuses in a high-growth tech company
  • Weekly catered lunches, snacks, and team-building events
  • Visa sponsorship and relocation support for international talent
  • Mental health support and 24/7 employee assistance programs

Canva is an equal opportunity employer.

Locations

  • Team Engineering, Global

Salary

Estimated Salary Rangemedium confidence

120,000 - 220,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Distributed ML training systemsintermediate
  • PyTorch/JAX expertiseintermediate
  • Megatron-LM/NVIDIA NeMointermediate
  • FSDP/ZeRO optimizationintermediate
  • CUDA/Triton kernel developmentintermediate
  • GPU performance profilingintermediate
  • Memory optimization techniquesintermediate
  • Multimodal AI modelsintermediate
  • LLM training at scaleintermediate
  • Cross-functional collaborationintermediate
  • Problem-solving in complex systemsintermediate
  • English communication proficiencyintermediate
  • Python systems programmingintermediate
  • Low-precision training (FP16/BF16)intermediate
  • Gradient checkpointingintermediate
  • Cluster management for ML workloadsintermediate
  • Performance debuggingintermediate
  • AI research partnershipintermediate

Required Qualifications

  • Strong background in LLMs, multimodal AI, or diffusion models (experience)
  • Proficiency in Python with familiarity in a systems programming language like C++ or Rust (experience)
  • Deep knowledge of PyTorch or JAX, and libraries such as Megatron-LM, NeMo, or DeepSpeed (experience)
  • Hands-on experience with distributed training optimization techniques including FSDP/ZeRO, gradient checkpointing, and low-precision data types (experience)
  • Experience writing custom GPU kernels in CUDA or Triton (experience)
  • Excellent communication and problem-solving skills with full proficiency in English (experience)
  • Proven track record of scaling ML training systems in production environments (experience)

Preferred Qualifications

  • Experience optimizing training for multimodal foundation models (experience)
  • Familiarity with NVIDIA Triton Inference Server for deployment (experience)
  • Background in designing systems for creative AI applications (experience)
  • Contributions to open-source ML frameworks or distributed systems (experience)
  • Experience collaborating across research, engineering, and product teams (experience)

Responsibilities

  • Design, implement, and optimize large-scale distributed training systems using Megatron-LM, NVIDIA NeMo, FSDP, and Triton
  • Improve GPU utilization, communication overhead, and memory efficiency for multimodal and foundation models
  • Partner closely with AI research and modeling teams to align systems with cutting-edge algorithmic needs
  • Evaluate and implement industry-leading best practices for scalable distributed training
  • Develop low-level optimizations including custom CUDA or Triton kernels
  • Debug, profile, and fine-tune training workflows to achieve breakthrough scalability
  • Collaborate globally within Canva's Generative AI supergroup to power creative intelligence features
  • Drive performance innovations that enable faster iteration on AI-assisted design tools
  • Mentor junior engineers and contribute to the technical roadmap for CORE team initiatives
  • Profile and optimize end-to-end ML pipelines from training to inference at massive scale

Benefits

  • general: Hybrid work model in Beijing with flexible hours to support global collaboration
  • general: Canva+ subscription including premium features for personal and team use
  • general: Comprehensive health insurance and wellness programs tailored for China
  • general: Generous parental leave and family support benefits
  • general: Learning stipend for conferences, courses, and certifications in AI/ML
  • general: Stock options and performance bonuses in a high-growth tech company
  • general: Weekly catered lunches, snacks, and team-building events
  • general: Visa sponsorship and relocation support for international talent
  • general: Mental health support and 24/7 employee assistance programs

Target Your Resume for "Senior Machine Learning Systems Engineer (Training Optimization)" , Canva

Get personalized recommendations to optimize your resume specifically for Senior Machine Learning Systems Engineer (Training Optimization). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Senior Machine Learning Systems Engineer (Training Optimization)" , Canva

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

CanvaDesignCountry Beijing / ChinaTeam EngineeringGlobalCountry Beijing / China

Related Jobs You May Like

No related jobs found at the moment.