Resume and JobRESUME AND JOB
Crusoe logo

Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!

full-timePosted: Feb 9, 2026

Job Description

Principal Engineer, AI Model Lifecycle at Crusoe

Crusoe is on a mission to accelerate the abundance of energy and intelligence. We’re building the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability. Join us and be a part of the AI revolution with sustainable technology.

Role Overview

As a Principal Software Engineer for the Model Lifecycle team, you will play a critical role in building a managed platform for the entire application development lifecycle, focusing on Machine Learning models, including Large Language Models (LLMs). You will design, develop, and maintain systems for fine-tuning, training, and deploying LLMs, contributing to the core infrastructure that powers Crusoe's AI initiatives. This role offers significant ownership and the opportunity to shape the future of AI at Crusoe.

A Day in the Life

Here’s a glimpse of what your day might look like:

  • Start your day by reviewing the performance of the previous night's training runs and identifying any bottlenecks or areas for optimization.
  • Participate in a sprint planning meeting with the product and platform teams to discuss upcoming features and priorities for the model lifecycle platform.
  • Spend the morning designing and implementing new features for the fine-tuning system, focusing on multi-node orchestration and cost-efficient scaling.
  • Collaborate with data scientists to integrate new datasets and models into the platform, ensuring proper versioning and lineage tracking.
  • Attend a code review session to provide feedback on your colleagues' work and ensure code quality.
  • In the afternoon, focus on building out the agent execution infrastructure, working on integrating new tools and libraries.
  • Research and experiment with new technologies and techniques for improving the performance and efficiency of LLM training and inference.
  • End the day by documenting your progress and planning for the next day's tasks.

Why San Francisco?

San Francisco is a global hub for technology and innovation, offering a vibrant ecosystem of startups, established tech companies, and research institutions. Being located in San Francisco provides access to a wealth of talent, resources, and opportunities for professional growth. The city's culture of innovation and collaboration makes it an ideal place to build cutting-edge AI solutions. Additionally, Crusoe's presence in San Francisco allows for close collaboration with leading AI researchers and engineers.

Career Path

This Principal Engineer role offers a clear path for career advancement within Crusoe. You can grow into a leadership position, such as a Staff Engineer or Engineering Manager, where you will be responsible for leading a team of engineers and driving the technical direction of the model lifecycle platform. Alternatively, you can deepen your technical expertise and become a Principal Architect, focusing on the overall architecture and design of Crusoe's AI infrastructure.

Salary & Benefits

Crusoe offers a competitive salary and benefits package, including:

  • Industry competitive pay
  • Restricted Stock Units in a fast-growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • Paid Time Off (PTO)
  • Professional Development Opportunities
  • Employee Assistance Program (EAP)
  • Wellness Programs
  • Commuter Benefits

Crusoe Culture

At Crusoe, we foster a culture of innovation, collaboration, and sustainability. We are committed to building a diverse and inclusive workplace where everyone feels valued and respected. We encourage our employees to take ownership of their work and to continuously learn and grow. We are passionate about using technology to solve some of the world's most pressing challenges, and we are looking for talented individuals who share our vision.

How to Apply

If you are passionate about AI and want to make a real impact, we encourage you to apply for the Principal Engineer, AI Model Lifecycle position at Crusoe. To apply, please submit your resume and cover letter through our online application portal. We look forward to hearing from you!

Frequently Asked Questions (FAQ)

  1. What is Crusoe's mission?

    Crusoe's mission is to accelerate the abundance of energy and intelligence.

  2. What will I be working on in this role?

    You will be working on managing fine-tuning systems for large foundation models, implementing training pipelines for LLMs, developing reinforcement learning pipelines, building agent execution infrastructure, and creating dataset and experiment management systems.

  3. What qualifications are required for this role?

    You need an advanced degree in Computer Science or a related field, 10-15+ years of experience in AI, expertise in cloud-based services, and experience with Generative AI and AI infrastructure.

  4. What are some bonus points for this role?

    Proficiency in Golang or Python, contributions to open-source AI projects, experience with GPU systems, and experience with PyTorch and LLM training are all considered bonus points.

  5. What are the benefits of working at Crusoe?

    Crusoe offers competitive pay, restricted stock units, comprehensive health insurance, HSA contributions, paid parental leave, life insurance, disability insurance, and Teladoc services.

  6. What is the work environment like at Crusoe?

    Crusoe fosters a proactive and collaborative environment where employees can work autonomously and contribute to cutting-edge AI products.

  7. What opportunities for growth are available?

    You can grow into leadership positions or deepen your technical expertise, such as becoming a Staff Engineer, Engineering Manager, or Principal Architect.

  8. Where is the position located?

    The position is located in San Francisco, California.

  9. How do I apply for this position?

    You can apply by submitting your resume and cover letter through our online application portal.

  10. What type of projects will I be involved in?

    You will be involved in designing and building core systems from first principles, shaping the core abstractions and APIs of the system, and influencing long-term architectural decisions around training runtimes and model lifecycle management.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

242,000 - 385,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Large Language Models (LLMs)intermediate
  • Generative AIintermediate
  • Machine Learning (ML)intermediate
  • AI Infrastructureintermediate
  • Model Trainingintermediate
  • Inferenceintermediate
  • Fine-tuningintermediate
  • SFTintermediate
  • PEFTintermediate
  • LoRAintermediate
  • Multi-node Orchestrationintermediate
  • Checkpointingintermediate
  • Failure Recoveryintermediate
  • Scalabilityintermediate
  • Reinforcement Learningintermediate
  • Policy Optimizationintermediate
  • Reward Modelingintermediate
  • Agent Execution Infrastructureintermediate
  • Experiment Managementintermediate
  • Versioningintermediate
  • Lineageintermediate
  • Evaluationintermediate
  • Reproducible Fine-tuningintermediate
  • Cloud-based Services (Elastic Compute, Object Storage, VPN, Managed Database)intermediate
  • Golangintermediate
  • Pythonintermediate
  • PyTorchintermediate
  • vLLMintermediate
  • GPU Systemsintermediate
  • Performance Optimizationintermediate

Required Qualifications

  • Advanced degree in Computer Science, Engineering, or a related field. (experience)
  • 10-15+ years of industry experience driving impactful projects in the AI Space. (experience)
  • Proven track record of delivering early-stage projects under tight deadlines. (experience)
  • Expertise in using cloud-based services, such as, elastic compute, object storage, virtual private networks, managed database, etc. (experience)
  • Experience in Generative AI (Large Language Models, Multimodal). (experience)
  • Deep experience with AI infrastructure, including training, inference. (experience)
  • Proficiency in Golang or Python for large-scale, production-level services. (experience)
  • Experience working with PyTorch (experience)
  • Contributions to open-source AI projects such as vLLM or similar frameworks. (experience)
  • Performance optimizations on GPU systems and inference frameworks. (experience)
  • Experience with training and fine-tuning LLMs (experience)
  • Proactive and collaborative approach with the ability to work autonomously. (experience)
  • Strong communication and interpersonal skills. (experience)
  • Passion for building cutting-edge AI products and solving challenging technical problems. (experience)

Responsibilities

  • Manage fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling.
  • Implement and maintain end-to-end training pipelines for Large Language Models.
  • Develop distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling).
  • Design and implement agent execution infrastructure.
  • Build dataset, model, and experiment management systems, including versioning, lineage, evaluation, and reproducible fine-tuning at scale.
  • Work closely with product, business, and platform teams to shape the core abstractions and APIs of the system.
  • Influence long-term architectural decisions around training runtimes, scheduling, storage, and model lifecycle management.
  • Contribute to and engage with the open-source LLM ecosystem.
  • Design and build core systems from first principles.
  • Ensure the scalability, reliability, and security of the model lifecycle platform.
  • Collaborate with data scientists and ML engineers to productionize AI models.
  • Monitor and optimize the performance of training and inference pipelines.

Benefits

  • general: Industry competitive pay
  • general: Restricted Stock Units in a fast-growing, well-funded technology company
  • general: Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • general: Employer contributions to HSA accounts
  • general: Paid Parental Leave
  • general: Paid life insurance
  • general: Short-term and long-term disability
  • general: Teladoc
  • general: Paid Time Off (PTO)
  • general: Professional Development Opportunities
  • general: Employee Assistance Program (EAP)
  • general: Wellness Programs
  • general: Commuter Benefits

Target Your Resume for "Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

AIMachine LearningLLMSan FranciscoPrincipal EngineerGenerative AICloudAI Model LifecycleLarge Language ModelsAI InfrastructureModel TrainingInferenceFine-tuningCaliforniaCrusoeSoftware EngineerDeep LearningPyTorchTensorFlowCloud ComputingAWSGCPAzureModel DeploymentMLOpsAI PlatformOpen Source AIvLLMGoLangPythonGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Crusoe logo

Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!

full-timePosted: Feb 9, 2026

Job Description

Principal Engineer, AI Model Lifecycle at Crusoe

Crusoe is on a mission to accelerate the abundance of energy and intelligence. We’re building the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability. Join us and be a part of the AI revolution with sustainable technology.

Role Overview

As a Principal Software Engineer for the Model Lifecycle team, you will play a critical role in building a managed platform for the entire application development lifecycle, focusing on Machine Learning models, including Large Language Models (LLMs). You will design, develop, and maintain systems for fine-tuning, training, and deploying LLMs, contributing to the core infrastructure that powers Crusoe's AI initiatives. This role offers significant ownership and the opportunity to shape the future of AI at Crusoe.

A Day in the Life

Here’s a glimpse of what your day might look like:

  • Start your day by reviewing the performance of the previous night's training runs and identifying any bottlenecks or areas for optimization.
  • Participate in a sprint planning meeting with the product and platform teams to discuss upcoming features and priorities for the model lifecycle platform.
  • Spend the morning designing and implementing new features for the fine-tuning system, focusing on multi-node orchestration and cost-efficient scaling.
  • Collaborate with data scientists to integrate new datasets and models into the platform, ensuring proper versioning and lineage tracking.
  • Attend a code review session to provide feedback on your colleagues' work and ensure code quality.
  • In the afternoon, focus on building out the agent execution infrastructure, working on integrating new tools and libraries.
  • Research and experiment with new technologies and techniques for improving the performance and efficiency of LLM training and inference.
  • End the day by documenting your progress and planning for the next day's tasks.

Why San Francisco?

San Francisco is a global hub for technology and innovation, offering a vibrant ecosystem of startups, established tech companies, and research institutions. Being located in San Francisco provides access to a wealth of talent, resources, and opportunities for professional growth. The city's culture of innovation and collaboration makes it an ideal place to build cutting-edge AI solutions. Additionally, Crusoe's presence in San Francisco allows for close collaboration with leading AI researchers and engineers.

Career Path

This Principal Engineer role offers a clear path for career advancement within Crusoe. You can grow into a leadership position, such as a Staff Engineer or Engineering Manager, where you will be responsible for leading a team of engineers and driving the technical direction of the model lifecycle platform. Alternatively, you can deepen your technical expertise and become a Principal Architect, focusing on the overall architecture and design of Crusoe's AI infrastructure.

Salary & Benefits

Crusoe offers a competitive salary and benefits package, including:

  • Industry competitive pay
  • Restricted Stock Units in a fast-growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • Paid Time Off (PTO)
  • Professional Development Opportunities
  • Employee Assistance Program (EAP)
  • Wellness Programs
  • Commuter Benefits

Crusoe Culture

At Crusoe, we foster a culture of innovation, collaboration, and sustainability. We are committed to building a diverse and inclusive workplace where everyone feels valued and respected. We encourage our employees to take ownership of their work and to continuously learn and grow. We are passionate about using technology to solve some of the world's most pressing challenges, and we are looking for talented individuals who share our vision.

How to Apply

If you are passionate about AI and want to make a real impact, we encourage you to apply for the Principal Engineer, AI Model Lifecycle position at Crusoe. To apply, please submit your resume and cover letter through our online application portal. We look forward to hearing from you!

Frequently Asked Questions (FAQ)

  1. What is Crusoe's mission?

    Crusoe's mission is to accelerate the abundance of energy and intelligence.

  2. What will I be working on in this role?

    You will be working on managing fine-tuning systems for large foundation models, implementing training pipelines for LLMs, developing reinforcement learning pipelines, building agent execution infrastructure, and creating dataset and experiment management systems.

  3. What qualifications are required for this role?

    You need an advanced degree in Computer Science or a related field, 10-15+ years of experience in AI, expertise in cloud-based services, and experience with Generative AI and AI infrastructure.

  4. What are some bonus points for this role?

    Proficiency in Golang or Python, contributions to open-source AI projects, experience with GPU systems, and experience with PyTorch and LLM training are all considered bonus points.

  5. What are the benefits of working at Crusoe?

    Crusoe offers competitive pay, restricted stock units, comprehensive health insurance, HSA contributions, paid parental leave, life insurance, disability insurance, and Teladoc services.

  6. What is the work environment like at Crusoe?

    Crusoe fosters a proactive and collaborative environment where employees can work autonomously and contribute to cutting-edge AI products.

  7. What opportunities for growth are available?

    You can grow into leadership positions or deepen your technical expertise, such as becoming a Staff Engineer, Engineering Manager, or Principal Architect.

  8. Where is the position located?

    The position is located in San Francisco, California.

  9. How do I apply for this position?

    You can apply by submitting your resume and cover letter through our online application portal.

  10. What type of projects will I be involved in?

    You will be involved in designing and building core systems from first principles, shaping the core abstractions and APIs of the system, and influencing long-term architectural decisions around training runtimes and model lifecycle management.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

242,000 - 385,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Large Language Models (LLMs)intermediate
  • Generative AIintermediate
  • Machine Learning (ML)intermediate
  • AI Infrastructureintermediate
  • Model Trainingintermediate
  • Inferenceintermediate
  • Fine-tuningintermediate
  • SFTintermediate
  • PEFTintermediate
  • LoRAintermediate
  • Multi-node Orchestrationintermediate
  • Checkpointingintermediate
  • Failure Recoveryintermediate
  • Scalabilityintermediate
  • Reinforcement Learningintermediate
  • Policy Optimizationintermediate
  • Reward Modelingintermediate
  • Agent Execution Infrastructureintermediate
  • Experiment Managementintermediate
  • Versioningintermediate
  • Lineageintermediate
  • Evaluationintermediate
  • Reproducible Fine-tuningintermediate
  • Cloud-based Services (Elastic Compute, Object Storage, VPN, Managed Database)intermediate
  • Golangintermediate
  • Pythonintermediate
  • PyTorchintermediate
  • vLLMintermediate
  • GPU Systemsintermediate
  • Performance Optimizationintermediate

Required Qualifications

  • Advanced degree in Computer Science, Engineering, or a related field. (experience)
  • 10-15+ years of industry experience driving impactful projects in the AI Space. (experience)
  • Proven track record of delivering early-stage projects under tight deadlines. (experience)
  • Expertise in using cloud-based services, such as, elastic compute, object storage, virtual private networks, managed database, etc. (experience)
  • Experience in Generative AI (Large Language Models, Multimodal). (experience)
  • Deep experience with AI infrastructure, including training, inference. (experience)
  • Proficiency in Golang or Python for large-scale, production-level services. (experience)
  • Experience working with PyTorch (experience)
  • Contributions to open-source AI projects such as vLLM or similar frameworks. (experience)
  • Performance optimizations on GPU systems and inference frameworks. (experience)
  • Experience with training and fine-tuning LLMs (experience)
  • Proactive and collaborative approach with the ability to work autonomously. (experience)
  • Strong communication and interpersonal skills. (experience)
  • Passion for building cutting-edge AI products and solving challenging technical problems. (experience)

Responsibilities

  • Manage fine-tuning systems for large foundation models (SFT, PEFT, LoRA, adapters), including multi-node orchestration, checkpointing, failure recovery, and cost-efficient scaling.
  • Implement and maintain end-to-end training pipelines for Large Language Models.
  • Develop distillation and reinforcement learning pipelines (e.g., preference optimization, policy optimization, reward modeling).
  • Design and implement agent execution infrastructure.
  • Build dataset, model, and experiment management systems, including versioning, lineage, evaluation, and reproducible fine-tuning at scale.
  • Work closely with product, business, and platform teams to shape the core abstractions and APIs of the system.
  • Influence long-term architectural decisions around training runtimes, scheduling, storage, and model lifecycle management.
  • Contribute to and engage with the open-source LLM ecosystem.
  • Design and build core systems from first principles.
  • Ensure the scalability, reliability, and security of the model lifecycle platform.
  • Collaborate with data scientists and ML engineers to productionize AI models.
  • Monitor and optimize the performance of training and inference pipelines.

Benefits

  • general: Industry competitive pay
  • general: Restricted Stock Units in a fast-growing, well-funded technology company
  • general: Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • general: Employer contributions to HSA accounts
  • general: Paid Parental Leave
  • general: Paid life insurance
  • general: Short-term and long-term disability
  • general: Teladoc
  • general: Paid Time Off (PTO)
  • general: Professional Development Opportunities
  • general: Employee Assistance Program (EAP)
  • general: Wellness Programs
  • general: Commuter Benefits

Target Your Resume for "Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

AIMachine LearningLLMSan FranciscoPrincipal EngineerGenerative AICloudAI Model LifecycleLarge Language ModelsAI InfrastructureModel TrainingInferenceFine-tuningCaliforniaCrusoeSoftware EngineerDeep LearningPyTorchTensorFlowCloud ComputingAWSGCPAzureModel DeploymentMLOpsAI PlatformOpen Source AIvLLMGoLangPythonGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Principal Engineer, AI Model Lifecycle Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.