Resume and JobRESUME AND JOB
Canva logo

Senior Machine Learning Engineer - Evaluations (Design Generation)

Canva

Senior Machine Learning Engineer - Evaluations (Design Generation)

Canva logo

Canva

full-time

Posted: December 16, 2025

Number of Vacancies: 1

Job Description

Senior Machine Learning Engineer - Evaluations (Design Generation)

Location: Team Engineering

Team: Country Sydney / Australia

About the Role

Join Canva's Design Generation Platform team as a Senior Machine Learning Engineer - Evaluations, where you'll build the infrastructure powering quality monitoring for AI systems that generate millions of designs monthly from simple text prompts. Our 8-engineer team focuses on developer experience tooling, platform orchestration, and self-service capabilities, owning the plumbing that makes Design Generation observable, debuggable, and improvable. We live by a philosophy of platform-owned orchestration—not application logic—building reusable infrastructure that scales across research and engineering teams in Canva's innovative, collaborative culture. As the Evaluation owner, you'll be the go-to expert bridging evaluation strategy, scalable infrastructure, and ecosystem integration. Guide teams on robust methodologies like LLM-as-Judge, 1st Party Quality Models, and user signals; build automated pipelines for continuous monitoring and production sampling; and integrate quality gates into CI/CD for regression-free deployments. Tackle subtle quality degradations in generative designs—off-brand visuals, suboptimal layouts, unpublished creations—using deep ML intuition to minimize false positives while enabling faster iteration in our design-first world. You'll curate benchmark datasets, optimize multi-dimensional scoring (brand adherence, visual appeal, layout quality, functional correctness), and define how Canva's evaluation tools compose coherently for Design Generation. Partner closely with researchers and engineers, fostering evidence-based decisions and self-service platforms that drive adoption. At Canva, we celebrate diverse backgrounds, curiosity, and passion—don't worry if you don't tick every box; if you're excited to shape the future of AI-driven design, apply and discover how you might be our perfect fit. Experience the magic of Canva: equity to share our success, inclusive parental leave, Vibe & Thrive allowances, hybrid flexibility in Sydney, and a culture of connectivity fueling crazy big goals. Work on transformative AI that democratizes design for everyone.

Key Responsibilities

  • Understand and optimize existing evaluation systems including LLM-as-Judge frameworks, visual quality models, and multi-dimensional scoring for brand adherence, visual appeal, layout quality, and functional correctness
  • Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost
  • Define evaluation strategies for pre-deployment validation, continuous monitoring, A/B experiment analysis, and model comparison
  • Curate high-quality evaluation datasets and benchmark suites representing diverse use cases, edge cases, and quality dimensions
  • Integrate evaluation systems into continuous deployment pipelines with automated quality gates to catch regressions
  • Reduce evaluation cycle time to enable faster iteration on model improvements and experiments
  • Partner with research teams to understand and meet evaluation needs for new model architectures
  • Define the evaluation ecosystem strategy and integration points across Canva's tools for Design Generation
  • Guide teams on evaluation best practices, methodologies, and result interpretation
  • Build alerting systems, continuous monitoring, production sampling, and step-level evaluation harnesses

Required Qualifications

  • Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale
  • Proven ability to build robust, scalable infrastructure for ML platforms
  • Deep understanding of distributed systems, observability patterns, and monitoring best practices
  • Python proficiency with production-quality coding standards, code reviews, and testing practices
  • Experience with data pipelines, time-series data, and statistical analysis for anomaly detection
  • SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems
  • Track record of building self-service platforms or developer tooling with strong adoption

Preferred Qualifications

  • Experience evaluating Gen AI systems at scale, especially those with creative outputs
  • Background in design generation, visual quality models, or LLM-as-Judge frameworks
  • Familiarity with continuous deployment pipelines and automated quality gates
  • Experience curating evaluation datasets for diverse use cases and edge cases

Required Skills

  • ML engineering at production scale
  • Platform and infrastructure engineering
  • Distributed systems design
  • Observability and monitoring
  • Python production coding
  • Data pipelines and ETL
  • Time-series data analysis
  • Statistical anomaly detection
  • SQL for large-scale analytics
  • Self-service developer tooling
  • Gen AI evaluation methodologies
  • LLM-as-Judge frameworks
  • Visual quality models
  • Continuous deployment integration
  • Cross-team collaboration
  • Pragmatic architecture decisions
  • Evaluation dataset curation
  • Quality gate automation

Benefits

  • Equity packages to share in Canva's success
  • Inclusive parental leave policy supporting all parents and carers
  • Annual Vibe & Thrive allowance for wellbeing, social connection, and office setup
  • Hybrid work model combining collaboration and flexibility
  • Moments of magic, connectivity, and fun woven throughout Canva life
  • Opportunities to work on cutting-edge AI that powers millions of designs monthly
  • Collaborative culture emphasizing evidence-based decisions and developer experience
  • Global team environment fostering innovation in design generation

Canva is an equal opportunity employer.

Locations

  • Team Engineering, Global

Salary

Estimated Salary Rangehigh confidence

210,000 - 320,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • ML engineering at production scaleintermediate
  • Platform and infrastructure engineeringintermediate
  • Distributed systems designintermediate
  • Observability and monitoringintermediate
  • Python production codingintermediate
  • Data pipelines and ETLintermediate
  • Time-series data analysisintermediate
  • Statistical anomaly detectionintermediate
  • SQL for large-scale analyticsintermediate
  • Self-service developer toolingintermediate
  • Gen AI evaluation methodologiesintermediate
  • LLM-as-Judge frameworksintermediate
  • Visual quality modelsintermediate
  • Continuous deployment integrationintermediate
  • Cross-team collaborationintermediate
  • Pragmatic architecture decisionsintermediate
  • Evaluation dataset curationintermediate
  • Quality gate automationintermediate

Required Qualifications

  • Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale (experience)
  • Proven ability to build robust, scalable infrastructure for ML platforms (experience)
  • Deep understanding of distributed systems, observability patterns, and monitoring best practices (experience)
  • Python proficiency with production-quality coding standards, code reviews, and testing practices (experience)
  • Experience with data pipelines, time-series data, and statistical analysis for anomaly detection (experience)
  • SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems (experience)
  • Track record of building self-service platforms or developer tooling with strong adoption (experience)

Preferred Qualifications

  • Experience evaluating Gen AI systems at scale, especially those with creative outputs (experience)
  • Background in design generation, visual quality models, or LLM-as-Judge frameworks (experience)
  • Familiarity with continuous deployment pipelines and automated quality gates (experience)
  • Experience curating evaluation datasets for diverse use cases and edge cases (experience)

Responsibilities

  • Understand and optimize existing evaluation systems including LLM-as-Judge frameworks, visual quality models, and multi-dimensional scoring for brand adherence, visual appeal, layout quality, and functional correctness
  • Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost
  • Define evaluation strategies for pre-deployment validation, continuous monitoring, A/B experiment analysis, and model comparison
  • Curate high-quality evaluation datasets and benchmark suites representing diverse use cases, edge cases, and quality dimensions
  • Integrate evaluation systems into continuous deployment pipelines with automated quality gates to catch regressions
  • Reduce evaluation cycle time to enable faster iteration on model improvements and experiments
  • Partner with research teams to understand and meet evaluation needs for new model architectures
  • Define the evaluation ecosystem strategy and integration points across Canva's tools for Design Generation
  • Guide teams on evaluation best practices, methodologies, and result interpretation
  • Build alerting systems, continuous monitoring, production sampling, and step-level evaluation harnesses

Benefits

  • general: Equity packages to share in Canva's success
  • general: Inclusive parental leave policy supporting all parents and carers
  • general: Annual Vibe & Thrive allowance for wellbeing, social connection, and office setup
  • general: Hybrid work model combining collaboration and flexibility
  • general: Moments of magic, connectivity, and fun woven throughout Canva life
  • general: Opportunities to work on cutting-edge AI that powers millions of designs monthly
  • general: Collaborative culture emphasizing evidence-based decisions and developer experience
  • general: Global team environment fostering innovation in design generation

Target Your Resume for "Senior Machine Learning Engineer - Evaluations (Design Generation)" , Canva

Get personalized recommendations to optimize your resume specifically for Senior Machine Learning Engineer - Evaluations (Design Generation). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Senior Machine Learning Engineer - Evaluations (Design Generation)" , Canva

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

CanvaDesignCountry Sydney / AustraliaTeam EngineeringGlobalCountry Sydney / Australia

Related Jobs You May Like

No related jobs found at the moment.

Canva logo

Senior Machine Learning Engineer - Evaluations (Design Generation)

Canva

Senior Machine Learning Engineer - Evaluations (Design Generation)

Canva logo

Canva

full-time

Posted: December 16, 2025

Number of Vacancies: 1

Job Description

Senior Machine Learning Engineer - Evaluations (Design Generation)

Location: Team Engineering

Team: Country Sydney / Australia

About the Role

Join Canva's Design Generation Platform team as a Senior Machine Learning Engineer - Evaluations, where you'll build the infrastructure powering quality monitoring for AI systems that generate millions of designs monthly from simple text prompts. Our 8-engineer team focuses on developer experience tooling, platform orchestration, and self-service capabilities, owning the plumbing that makes Design Generation observable, debuggable, and improvable. We live by a philosophy of platform-owned orchestration—not application logic—building reusable infrastructure that scales across research and engineering teams in Canva's innovative, collaborative culture. As the Evaluation owner, you'll be the go-to expert bridging evaluation strategy, scalable infrastructure, and ecosystem integration. Guide teams on robust methodologies like LLM-as-Judge, 1st Party Quality Models, and user signals; build automated pipelines for continuous monitoring and production sampling; and integrate quality gates into CI/CD for regression-free deployments. Tackle subtle quality degradations in generative designs—off-brand visuals, suboptimal layouts, unpublished creations—using deep ML intuition to minimize false positives while enabling faster iteration in our design-first world. You'll curate benchmark datasets, optimize multi-dimensional scoring (brand adherence, visual appeal, layout quality, functional correctness), and define how Canva's evaluation tools compose coherently for Design Generation. Partner closely with researchers and engineers, fostering evidence-based decisions and self-service platforms that drive adoption. At Canva, we celebrate diverse backgrounds, curiosity, and passion—don't worry if you don't tick every box; if you're excited to shape the future of AI-driven design, apply and discover how you might be our perfect fit. Experience the magic of Canva: equity to share our success, inclusive parental leave, Vibe & Thrive allowances, hybrid flexibility in Sydney, and a culture of connectivity fueling crazy big goals. Work on transformative AI that democratizes design for everyone.

Key Responsibilities

  • Understand and optimize existing evaluation systems including LLM-as-Judge frameworks, visual quality models, and multi-dimensional scoring for brand adherence, visual appeal, layout quality, and functional correctness
  • Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost
  • Define evaluation strategies for pre-deployment validation, continuous monitoring, A/B experiment analysis, and model comparison
  • Curate high-quality evaluation datasets and benchmark suites representing diverse use cases, edge cases, and quality dimensions
  • Integrate evaluation systems into continuous deployment pipelines with automated quality gates to catch regressions
  • Reduce evaluation cycle time to enable faster iteration on model improvements and experiments
  • Partner with research teams to understand and meet evaluation needs for new model architectures
  • Define the evaluation ecosystem strategy and integration points across Canva's tools for Design Generation
  • Guide teams on evaluation best practices, methodologies, and result interpretation
  • Build alerting systems, continuous monitoring, production sampling, and step-level evaluation harnesses

Required Qualifications

  • Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale
  • Proven ability to build robust, scalable infrastructure for ML platforms
  • Deep understanding of distributed systems, observability patterns, and monitoring best practices
  • Python proficiency with production-quality coding standards, code reviews, and testing practices
  • Experience with data pipelines, time-series data, and statistical analysis for anomaly detection
  • SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems
  • Track record of building self-service platforms or developer tooling with strong adoption

Preferred Qualifications

  • Experience evaluating Gen AI systems at scale, especially those with creative outputs
  • Background in design generation, visual quality models, or LLM-as-Judge frameworks
  • Familiarity with continuous deployment pipelines and automated quality gates
  • Experience curating evaluation datasets for diverse use cases and edge cases

Required Skills

  • ML engineering at production scale
  • Platform and infrastructure engineering
  • Distributed systems design
  • Observability and monitoring
  • Python production coding
  • Data pipelines and ETL
  • Time-series data analysis
  • Statistical anomaly detection
  • SQL for large-scale analytics
  • Self-service developer tooling
  • Gen AI evaluation methodologies
  • LLM-as-Judge frameworks
  • Visual quality models
  • Continuous deployment integration
  • Cross-team collaboration
  • Pragmatic architecture decisions
  • Evaluation dataset curation
  • Quality gate automation

Benefits

  • Equity packages to share in Canva's success
  • Inclusive parental leave policy supporting all parents and carers
  • Annual Vibe & Thrive allowance for wellbeing, social connection, and office setup
  • Hybrid work model combining collaboration and flexibility
  • Moments of magic, connectivity, and fun woven throughout Canva life
  • Opportunities to work on cutting-edge AI that powers millions of designs monthly
  • Collaborative culture emphasizing evidence-based decisions and developer experience
  • Global team environment fostering innovation in design generation

Canva is an equal opportunity employer.

Locations

  • Team Engineering, Global

Salary

Estimated Salary Rangehigh confidence

210,000 - 320,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • ML engineering at production scaleintermediate
  • Platform and infrastructure engineeringintermediate
  • Distributed systems designintermediate
  • Observability and monitoringintermediate
  • Python production codingintermediate
  • Data pipelines and ETLintermediate
  • Time-series data analysisintermediate
  • Statistical anomaly detectionintermediate
  • SQL for large-scale analyticsintermediate
  • Self-service developer toolingintermediate
  • Gen AI evaluation methodologiesintermediate
  • LLM-as-Judge frameworksintermediate
  • Visual quality modelsintermediate
  • Continuous deployment integrationintermediate
  • Cross-team collaborationintermediate
  • Pragmatic architecture decisionsintermediate
  • Evaluation dataset curationintermediate
  • Quality gate automationintermediate

Required Qualifications

  • Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale (experience)
  • Proven ability to build robust, scalable infrastructure for ML platforms (experience)
  • Deep understanding of distributed systems, observability patterns, and monitoring best practices (experience)
  • Python proficiency with production-quality coding standards, code reviews, and testing practices (experience)
  • Experience with data pipelines, time-series data, and statistical analysis for anomaly detection (experience)
  • SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems (experience)
  • Track record of building self-service platforms or developer tooling with strong adoption (experience)

Preferred Qualifications

  • Experience evaluating Gen AI systems at scale, especially those with creative outputs (experience)
  • Background in design generation, visual quality models, or LLM-as-Judge frameworks (experience)
  • Familiarity with continuous deployment pipelines and automated quality gates (experience)
  • Experience curating evaluation datasets for diverse use cases and edge cases (experience)

Responsibilities

  • Understand and optimize existing evaluation systems including LLM-as-Judge frameworks, visual quality models, and multi-dimensional scoring for brand adherence, visual appeal, layout quality, and functional correctness
  • Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost
  • Define evaluation strategies for pre-deployment validation, continuous monitoring, A/B experiment analysis, and model comparison
  • Curate high-quality evaluation datasets and benchmark suites representing diverse use cases, edge cases, and quality dimensions
  • Integrate evaluation systems into continuous deployment pipelines with automated quality gates to catch regressions
  • Reduce evaluation cycle time to enable faster iteration on model improvements and experiments
  • Partner with research teams to understand and meet evaluation needs for new model architectures
  • Define the evaluation ecosystem strategy and integration points across Canva's tools for Design Generation
  • Guide teams on evaluation best practices, methodologies, and result interpretation
  • Build alerting systems, continuous monitoring, production sampling, and step-level evaluation harnesses

Benefits

  • general: Equity packages to share in Canva's success
  • general: Inclusive parental leave policy supporting all parents and carers
  • general: Annual Vibe & Thrive allowance for wellbeing, social connection, and office setup
  • general: Hybrid work model combining collaboration and flexibility
  • general: Moments of magic, connectivity, and fun woven throughout Canva life
  • general: Opportunities to work on cutting-edge AI that powers millions of designs monthly
  • general: Collaborative culture emphasizing evidence-based decisions and developer experience
  • general: Global team environment fostering innovation in design generation

Target Your Resume for "Senior Machine Learning Engineer - Evaluations (Design Generation)" , Canva

Get personalized recommendations to optimize your resume specifically for Senior Machine Learning Engineer - Evaluations (Design Generation). Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Senior Machine Learning Engineer - Evaluations (Design Generation)" , Canva

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

CanvaDesignCountry Sydney / AustraliaTeam EngineeringGlobalCountry Sydney / Australia

Related Jobs You May Like

No related jobs found at the moment.