RESUME AND JOB

Research Engineer - Evaluations

Canva

Research Engineer - Evaluations

Canva

internshipPosted: Dec 16, 2025

Job Description

Research Engineer - Evaluations

Location: Team Engineering

Team: Country Vienna / Austria

About the Role

At Canva, our mission is to empower the world to design through magical products fueled by cutting-edge AI. We're seeking a Research Engineer - Evaluations to build our next-generation evaluation system for generative AI models, ensuring they deliver truly helpful, human-aligned design outputs. Based in our Vienna, Austria hub - home to Canva's exciting European AI operations - you'll engineer sophisticated AI agents that automatically assess design quality, relevance, and alignment using Multimodal Large Language Models (MLLMs). This high-impact role creates rapid feedback loops that guide our design generation research, directly empowering millions of users worldwide. You'll focus on agentic evaluation systems, inference-time alignment techniques like prompt engineering, RAG, and in-context learning, plus rigorous model benchmarking frameworks. Primary responsibilities include designing scalable 'MLLM-as-a-Judge' infrastructure, analyzing failure modes for actionable insights, and collaborating with research scientists to integrate evaluations into our ML lifecycle. Working in our collaborative, innovative culture, you'll translate bleeding-edge research into production systems that make Canva's AI more intuitive and design-focused. Join a team that's reimagining AI for design in a hybrid Vienna environment that balances deep work with Canva's signature fun and connectivity. With equity, flexible leave, wellbeing allowances, and the chance to shape the future of creative AI, this role offers massive impact in a company obsessed with empowering everyone to design.

Key Responsibilities

Design, build, and optimize infrastructure for 'MLLM-as-a-Judge' evaluation systems providing scalable automated feedback
Implement and experiment with inference-time alignment techniques including prompt engineering, RAG, and in-context learning
Establish and manage comprehensive benchmarking processes for foundation models on design-centric tasks
Analyze evaluation data to identify model failure modes and deliver actionable recommendations
Collaborate with research scientists and ML engineers to integrate agentic evaluation into the model development lifecycle
Engineer autonomous AI agents using Multimodal Large Language Models to assess generated design quality and human alignment
Translate latest research in LLM evaluation and agentic AI into production-ready engineering solutions
Build rigorous frameworks for systematic model benchmarking and analysis
Provide rapid feedback loops to guide the future of design generation at Canva
Optimize evaluation systems for speed and scalability across distributed environments

Required Qualifications

Strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures
Practical experience creating data-driven evaluation methodologies for AI models
Experience managing or optimizing large-scale distributed model training across hundreds of GPUs
Solid understanding of machine learning with hands-on experience using PyTorch and code optimization for speed
Disciplined coding practices including experience with code reviews and pull requests
Experience working in cloud environments, ideally AWS
Proven ability to analyze complex data and provide actionable insights

Preferred Qualifications

Familiarity with evaluation libraries and frameworks
Experience building or working with agentic AI systems or multi-agent coordination
Knowledge of data visualization tools to communicate findings effectively
Background or interest in human-computer interaction, design principles, or AI ethics
Experience with multimodal large language models (MLLMs)

Required Skills

Generative AI models (Diffusion, GANs, Transformers)
PyTorch and ML code optimization
Distributed training across GPU clusters
Cloud environments (AWS preferred)
Data-driven evaluation methodologies
Inference-time alignment techniques (Prompt Engineering, RAG, ICL)
Agentic AI systems and MLLMs
Model benchmarking and analysis
Code reviews and disciplined engineering practices
Multimodal evaluation systems
Failure mode analysis
Cross-functional collaboration
Research-to-production translation
Scalable infrastructure engineering
Design quality assessment

Benefits

Equity packages to share in Canva's success
Inclusive parental leave policy supporting all parents and carers
Annual Vibe & Thrive allowance for wellbeing, social connection, and home office setup
Flexible leave options empowering personal recharge and growth
Hybrid work model balancing collaboration and flexibility
Part of Canva's innovative AI team redefining design generation
Opportunities to work on cutting-edge generative AI impacting millions of users
Rich culture of magic, connectivity, and fun woven throughout life at Canva

Canva is an equal opportunity employer.

Locations

Team Engineering, Global

Salary

Estimated Salary Rangemedium confidence

75,000 - 120,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Generative AI models (Diffusion, GANs, Transformers)intermediate
PyTorch and ML code optimizationintermediate
Distributed training across GPU clustersintermediate
Cloud environments (AWS preferred)intermediate
Data-driven evaluation methodologiesintermediate
Inference-time alignment techniques (Prompt Engineering, RAG, ICL)intermediate
Agentic AI systems and MLLMsintermediate
Model benchmarking and analysisintermediate
Code reviews and disciplined engineering practicesintermediate
Multimodal evaluation systemsintermediate
Failure mode analysisintermediate
Cross-functional collaborationintermediate
Research-to-production translationintermediate
Scalable infrastructure engineeringintermediate
Design quality assessmentintermediate

Required Qualifications

Strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures (experience)
Practical experience creating data-driven evaluation methodologies for AI models (experience)
Experience managing or optimizing large-scale distributed model training across hundreds of GPUs (experience)
Solid understanding of machine learning with hands-on experience using PyTorch and code optimization for speed (experience)
Disciplined coding practices including experience with code reviews and pull requests (experience)
Experience working in cloud environments, ideally AWS (experience)
Proven ability to analyze complex data and provide actionable insights (experience)

Preferred Qualifications

Familiarity with evaluation libraries and frameworks (experience)
Experience building or working with agentic AI systems or multi-agent coordination (experience)
Knowledge of data visualization tools to communicate findings effectively (experience)
Background or interest in human-computer interaction, design principles, or AI ethics (experience)
Experience with multimodal large language models (MLLMs) (experience)

Responsibilities

Design, build, and optimize infrastructure for 'MLLM-as-a-Judge' evaluation systems providing scalable automated feedback
Implement and experiment with inference-time alignment techniques including prompt engineering, RAG, and in-context learning
Establish and manage comprehensive benchmarking processes for foundation models on design-centric tasks
Analyze evaluation data to identify model failure modes and deliver actionable recommendations
Collaborate with research scientists and ML engineers to integrate agentic evaluation into the model development lifecycle
Engineer autonomous AI agents using Multimodal Large Language Models to assess generated design quality and human alignment
Translate latest research in LLM evaluation and agentic AI into production-ready engineering solutions
Build rigorous frameworks for systematic model benchmarking and analysis
Provide rapid feedback loops to guide the future of design generation at Canva
Optimize evaluation systems for speed and scalability across distributed environments

Benefits

general: Equity packages to share in Canva's success
general: Inclusive parental leave policy supporting all parents and carers
general: Annual Vibe & Thrive allowance for wellbeing, social connection, and home office setup
general: Flexible leave options empowering personal recharge and growth
general: Hybrid work model balancing collaboration and flexibility
general: Part of Canva's innovative AI team redefining design generation
general: Opportunities to work on cutting-edge generative AI impacting millions of users
general: Rich culture of magic, connectivity, and fun woven throughout life at Canva

Target Your Resume for "Research Engineer - Evaluations" , Canva

Get personalized recommendations to optimize your resume specifically for Research Engineer - Evaluations. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Research Engineer - Evaluations" , Canva

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

CanvaDesignCountry Vienna / AustriaTeam EngineeringGlobalCountry Vienna / Austria

Answer 10 quick questions to check your fit for Research Engineer - Evaluations @ Canva.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Research Engineer - Evaluations

Canva

Research Engineer - Evaluations

Canva

internshipPosted: Dec 16, 2025

Job Description

Research Engineer - Evaluations

Location: Team Engineering

Team: Country Vienna / Austria

About the Role

Key Responsibilities

Design, build, and optimize infrastructure for 'MLLM-as-a-Judge' evaluation systems providing scalable automated feedback
Implement and experiment with inference-time alignment techniques including prompt engineering, RAG, and in-context learning
Establish and manage comprehensive benchmarking processes for foundation models on design-centric tasks
Analyze evaluation data to identify model failure modes and deliver actionable recommendations
Collaborate with research scientists and ML engineers to integrate agentic evaluation into the model development lifecycle
Engineer autonomous AI agents using Multimodal Large Language Models to assess generated design quality and human alignment
Translate latest research in LLM evaluation and agentic AI into production-ready engineering solutions
Build rigorous frameworks for systematic model benchmarking and analysis
Provide rapid feedback loops to guide the future of design generation at Canva
Optimize evaluation systems for speed and scalability across distributed environments

Required Qualifications

Strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures
Practical experience creating data-driven evaluation methodologies for AI models
Experience managing or optimizing large-scale distributed model training across hundreds of GPUs
Solid understanding of machine learning with hands-on experience using PyTorch and code optimization for speed
Disciplined coding practices including experience with code reviews and pull requests
Experience working in cloud environments, ideally AWS
Proven ability to analyze complex data and provide actionable insights

Preferred Qualifications

Familiarity with evaluation libraries and frameworks
Experience building or working with agentic AI systems or multi-agent coordination
Knowledge of data visualization tools to communicate findings effectively
Background or interest in human-computer interaction, design principles, or AI ethics
Experience with multimodal large language models (MLLMs)

Required Skills

Generative AI models (Diffusion, GANs, Transformers)
PyTorch and ML code optimization
Distributed training across GPU clusters
Cloud environments (AWS preferred)
Data-driven evaluation methodologies
Inference-time alignment techniques (Prompt Engineering, RAG, ICL)
Agentic AI systems and MLLMs
Model benchmarking and analysis
Code reviews and disciplined engineering practices
Multimodal evaluation systems
Failure mode analysis
Cross-functional collaboration
Research-to-production translation
Scalable infrastructure engineering
Design quality assessment

Benefits

Equity packages to share in Canva's success
Inclusive parental leave policy supporting all parents and carers
Annual Vibe & Thrive allowance for wellbeing, social connection, and home office setup
Flexible leave options empowering personal recharge and growth
Hybrid work model balancing collaboration and flexibility
Part of Canva's innovative AI team redefining design generation
Opportunities to work on cutting-edge generative AI impacting millions of users
Rich culture of magic, connectivity, and fun woven throughout life at Canva

Canva is an equal opportunity employer.

Locations

Team Engineering, Global

Salary

Estimated Salary Rangemedium confidence

75,000 - 120,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Generative AI models (Diffusion, GANs, Transformers)intermediate
PyTorch and ML code optimizationintermediate
Distributed training across GPU clustersintermediate
Cloud environments (AWS preferred)intermediate
Data-driven evaluation methodologiesintermediate
Inference-time alignment techniques (Prompt Engineering, RAG, ICL)intermediate
Agentic AI systems and MLLMsintermediate
Model benchmarking and analysisintermediate
Code reviews and disciplined engineering practicesintermediate
Multimodal evaluation systemsintermediate
Failure mode analysisintermediate
Cross-functional collaborationintermediate
Research-to-production translationintermediate
Scalable infrastructure engineeringintermediate
Design quality assessmentintermediate

Required Qualifications

Strong understanding of generative AI models (e.g., Diffusion Models, GANs, Transformers) and their architectures (experience)
Practical experience creating data-driven evaluation methodologies for AI models (experience)
Experience managing or optimizing large-scale distributed model training across hundreds of GPUs (experience)
Solid understanding of machine learning with hands-on experience using PyTorch and code optimization for speed (experience)
Disciplined coding practices including experience with code reviews and pull requests (experience)
Experience working in cloud environments, ideally AWS (experience)
Proven ability to analyze complex data and provide actionable insights (experience)

Preferred Qualifications

Familiarity with evaluation libraries and frameworks (experience)
Experience building or working with agentic AI systems or multi-agent coordination (experience)
Knowledge of data visualization tools to communicate findings effectively (experience)
Background or interest in human-computer interaction, design principles, or AI ethics (experience)
Experience with multimodal large language models (MLLMs) (experience)

Responsibilities

Design, build, and optimize infrastructure for 'MLLM-as-a-Judge' evaluation systems providing scalable automated feedback
Implement and experiment with inference-time alignment techniques including prompt engineering, RAG, and in-context learning
Establish and manage comprehensive benchmarking processes for foundation models on design-centric tasks
Analyze evaluation data to identify model failure modes and deliver actionable recommendations
Collaborate with research scientists and ML engineers to integrate agentic evaluation into the model development lifecycle
Engineer autonomous AI agents using Multimodal Large Language Models to assess generated design quality and human alignment
Translate latest research in LLM evaluation and agentic AI into production-ready engineering solutions
Build rigorous frameworks for systematic model benchmarking and analysis
Provide rapid feedback loops to guide the future of design generation at Canva
Optimize evaluation systems for speed and scalability across distributed environments

Benefits

general: Equity packages to share in Canva's success
general: Inclusive parental leave policy supporting all parents and carers
general: Annual Vibe & Thrive allowance for wellbeing, social connection, and home office setup
general: Flexible leave options empowering personal recharge and growth
general: Hybrid work model balancing collaboration and flexibility
general: Part of Canva's innovative AI team redefining design generation
general: Opportunities to work on cutting-edge generative AI impacting millions of users
general: Rich culture of magic, connectivity, and fun woven throughout life at Canva

Target Your Resume for "Research Engineer - Evaluations" , Canva

Get personalized recommendations to optimize your resume specifically for Research Engineer - Evaluations. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Research Engineer - Evaluations" , Canva

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

CanvaDesignCountry Vienna / AustriaTeam EngineeringGlobalCountry Vienna / Austria

Answer 10 quick questions to check your fit for Research Engineer - Evaluations @ Canva.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap