Resume and JobRESUME AND JOB
Crusoe logo

Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!

Crusoe

Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!

full-timePosted: Feb 11, 2026

Job Description

Incident Manager at Crusoe: Powering the AI Revolution

Crusoe is on a mission to accelerate the abundance of energy and intelligence. We're building the infrastructure that powers a world where people can create ambitiously with AI – without compromising on scale, speed, or sustainability. As an Incident Manager at Crusoe, you'll be at the forefront of this revolution, ensuring the reliability and resilience of our cutting-edge cloud infrastructure.

Role Overview

The Incident Manager role is pivotal in upholding service reliability and customer trust. You will be directly impacting the company's success by minimizing downtime and swiftly resolving critical issues. You'll lead the management of high-visibility incidents and customer escalations, ensuring our responses to complex technical challenges are rapid and effective.

This role goes beyond just immediate problem-solving. We're looking for someone to sharpen our incident management practices, ensuring a superior customer experience during crises, and building robust preventative measures. You'll leverage data analytics to enhance system resilience and reliability, turning every incident into an opportunity to strengthen our product and processes.

A Day in the Life of an Incident Manager

Here’s a glimpse into what your typical day might look like:

  • Morning: Start by reviewing overnight incident reports and identifying any emerging trends. Collaborate with engineering teams to understand the root causes of recent issues and track the progress of ongoing resolutions.
  • Mid-day: Lead a high-priority incident bridge, coordinating efforts between various technical teams to diagnose and resolve a critical system outage. Manage communication with stakeholders, providing timely updates on the incident's status.
  • Afternoon: Conduct a post-incident review, analyzing the timeline, impact, and contributing factors of a recent incident. Develop actionable recommendations to prevent similar incidents from occurring in the future.
  • Ongoing: Work on documentation and training to ensure consistency and accuracy for future scenarios.
  • Daily: Update runbooks with the latest solutions, ensuring that Customer Success and Customer Support Engineers have access to the most effective solutions.

Why San Francisco?

San Francisco is a global hub for technology and innovation, making it the ideal location for Crusoe's headquarters. Being based here offers numerous advantages:

  • Access to Top Talent: San Francisco attracts some of the brightest minds in the industry, providing Crusoe with access to a deep pool of skilled professionals.
  • Collaboration and Networking: The city fosters a vibrant ecosystem of startups, established tech companies, and research institutions, creating opportunities for collaboration and networking.
  • Innovation Hub: San Francisco is at the forefront of technological advancements, allowing Crusoe to stay ahead of the curve and leverage the latest innovations.
  • Strategic Location: San Francisco provides easy access to key data centers and customer locations.

Career Path

The Incident Manager role at Crusoe offers a clear path for career growth and advancement. You could potentially move into roles such as:

  • Senior Incident Manager: Lead a team of Incident Managers and oversee the company's overall incident management strategy.
  • Principal Engineer, Reliability: Focus on proactively improving system reliability and resilience through data analysis, automation, and process optimization.
  • Director of Operations: Oversee all aspects of the company's operations, ensuring smooth and efficient service delivery.

Salary and Benefits

The estimated salary range for an Incident Manager in San Francisco is $130,000 - $180,000 per year. Crusoe offers a comprehensive benefits package, including:

  • Competitive salary and performance-based bonuses
  • Comprehensive health, dental, and vision insurance
  • Paid time off and holidays
  • 401(k) plan with company match
  • Stock options
  • Professional development opportunities
  • A supportive and collaborative work environment

Crusoe Culture

At Crusoe, we are driven by a shared passion for innovation and a commitment to sustainability. We foster a culture of collaboration, respect, and continuous learning. We believe in empowering our employees to make a real impact on the world.

How to Apply

If you are a highly motivated and experienced Incident Manager with a passion for technology and a desire to make a difference, we encourage you to apply. Please submit your resume and cover letter through our online application portal.

Frequently Asked Questions (FAQ)

  1. What is Crusoe's mission?

    Crusoe's mission is to accelerate the abundance of energy and intelligence.

  2. What technologies will I be working with?

    You will be working with Linux, Virtualization, Kubernetes, TCP/IP, and Infrastructure-as-Code (IaC).

  3. What certifications are preferred?

    NVIDIA, Linux, and Kubernetes certifications are strongly preferred.

  4. What is the work environment like at Crusoe?

    Crusoe fosters a culture of collaboration, respect, and continuous learning.

  5. What are the career growth opportunities?

    You could potentially move into roles such as Senior Incident Manager, Principal Engineer, Reliability, or Director of Operations.

  6. What is the salary range for this position?

    The estimated salary range is $130,000 - $180,000 per year.

  7. What benefits does Crusoe offer?

    Crusoe offers a comprehensive benefits package, including health, dental, and vision insurance, paid time off, a 401(k) plan, and stock options.

  8. What kind of experience is Crusoe looking for?

    Crusoe is looking for someone with 4-5 years of customer-facing experience and 3-5+ years of experience in a team leadership role.

  9. Is programming experience required?

    Programming skills are a bonus.

  10. How do I apply for this position?

    Please submit your resume and cover letter through our online application portal.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

143,000 - 198,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Incident Managementintermediate
  • Crisis Managementintermediate
  • Data Analyticsintermediate
  • System Resiliencyintermediate
  • Linuxintermediate
  • Virtualizationintermediate
  • Kubernetesintermediate
  • TCP/IPintermediate
  • Infrastructure-as-Code (IaC)intermediate
  • Troubleshootingintermediate
  • Customer Supportintermediate
  • Technical Documentationintermediate
  • Communication Skillsintermediate
  • Problem Solvingintermediate
  • Team Leadershipintermediate
  • Infinibandintermediate
  • Containerizationintermediate
  • Distributed Trainingintermediate
  • Networkingintermediate
  • Customer Escalation Managementintermediate

Required Qualifications

  • 4-5 years of customer-facing experience (experience)
  • 3-5+ years’ experience in a team leadership role (experience)
  • Strong technical experience with Linux, Virtualization, and Kubernetes (experience)
  • Solid understanding of the TCP/IP stack (experience)
  • Experience with Infrastructure-as-Code (IaC) practices (experience)
  • Experience in diagnosing and resolving complex technical issues (experience)
  • Experience in developing and delivering training materials (experience)
  • Experience in incident response and management (experience)
  • Experience in using data analytics to drive system resiliency (experience)
  • Ability to lead incident responses for high-visibility issues (experience)
  • Strong communication and interpersonal skills (experience)
  • Ability to work closely with internal engineering and product teams (experience)
  • NVIDIA, Linux, and Kubernetes certifications are strongly preferred (experience)
  • Programming skills with one or more programming languages (bonus) (experience)
  • Proven ability to maintain customer trust during outages (experience)
  • Experience with Infiniband, containerization, and distributed training (experience)

Responsibilities

  • Lead incident responses for high-visibility issues, ensuring minimal disruption to customer operations.
  • Manage communication and strategy to maintain customer trust during outages or critical failures.
  • Utilize data analytics to identify trends in incidents, translating these insights into actionable strategies for greater system resiliency and reliability.
  • Develop robust incident response strategies and designs.
  • Conduct deep post-incident reviews to ensure root causes are addressed and recurrences are eliminated.
  • Diagnose and resolve complex technical issues related to Infiniband, containerization, and distributed training.
  • Guide and assist customers in implementing and optimizing their HPC infrastructure to achieve maximum performance and efficiency.
  • Develop and deliver training materials, including internal training sessions, documentation, and knowledge base articles, to empower customers to effectively utilize our solutions.
  • Work closely with internal engineering and product teams to provide valuable customer feedback.
  • Act as a key technical resource, helping our Customer Support Engineers (CSEs) and Customer Success Managers (CSMs) understand and resolve complex product issues.
  • Spearhead the management of high-visibility incidents and customer escalations.
  • Ensure rapid and effective responses to complex technical challenges.

Benefits

  • general: Competitive salary and benefits package
  • general: Opportunity to work on cutting-edge technology in the AI and cloud infrastructure space
  • general: Be a part of a company that is committed to sustainability and responsible innovation
  • general: Opportunity to drive meaningful innovation and make a tangible impact
  • general: Join a team that is setting the pace for responsible, transformative cloud infrastructure
  • general: Professional development and training opportunities
  • general: Collaborative and supportive work environment
  • general: Opportunity to work with a diverse and talented team
  • general: Health insurance
  • general: Dental insurance
  • general: Vision insurance
  • general: Paid time off
  • general: Stock options
  • general: 401(k) plan

Target Your Resume for "Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Incident ManagementLinuxKubernetesNetworkingCloudSan FranciscoIncident ManagerCaliforniaCrusoeAIArtificial IntelligenceCloud InfrastructureIncident ResponseCrisis ManagementData AnalyticsSystem ReliabilityVirtualizationTCP/IPInfrastructure-as-CodeCustomer SupportTechnical SupportTroubleshootingHigh-Performance ComputingHPCCareerJobEmploymentNVIDIAContainerizationDistributed TrainingGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now! @ Crusoe.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Crusoe logo

Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!

Crusoe

Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!

full-timePosted: Feb 11, 2026

Job Description

Incident Manager at Crusoe: Powering the AI Revolution

Crusoe is on a mission to accelerate the abundance of energy and intelligence. We're building the infrastructure that powers a world where people can create ambitiously with AI – without compromising on scale, speed, or sustainability. As an Incident Manager at Crusoe, you'll be at the forefront of this revolution, ensuring the reliability and resilience of our cutting-edge cloud infrastructure.

Role Overview

The Incident Manager role is pivotal in upholding service reliability and customer trust. You will be directly impacting the company's success by minimizing downtime and swiftly resolving critical issues. You'll lead the management of high-visibility incidents and customer escalations, ensuring our responses to complex technical challenges are rapid and effective.

This role goes beyond just immediate problem-solving. We're looking for someone to sharpen our incident management practices, ensuring a superior customer experience during crises, and building robust preventative measures. You'll leverage data analytics to enhance system resilience and reliability, turning every incident into an opportunity to strengthen our product and processes.

A Day in the Life of an Incident Manager

Here’s a glimpse into what your typical day might look like:

  • Morning: Start by reviewing overnight incident reports and identifying any emerging trends. Collaborate with engineering teams to understand the root causes of recent issues and track the progress of ongoing resolutions.
  • Mid-day: Lead a high-priority incident bridge, coordinating efforts between various technical teams to diagnose and resolve a critical system outage. Manage communication with stakeholders, providing timely updates on the incident's status.
  • Afternoon: Conduct a post-incident review, analyzing the timeline, impact, and contributing factors of a recent incident. Develop actionable recommendations to prevent similar incidents from occurring in the future.
  • Ongoing: Work on documentation and training to ensure consistency and accuracy for future scenarios.
  • Daily: Update runbooks with the latest solutions, ensuring that Customer Success and Customer Support Engineers have access to the most effective solutions.

Why San Francisco?

San Francisco is a global hub for technology and innovation, making it the ideal location for Crusoe's headquarters. Being based here offers numerous advantages:

  • Access to Top Talent: San Francisco attracts some of the brightest minds in the industry, providing Crusoe with access to a deep pool of skilled professionals.
  • Collaboration and Networking: The city fosters a vibrant ecosystem of startups, established tech companies, and research institutions, creating opportunities for collaboration and networking.
  • Innovation Hub: San Francisco is at the forefront of technological advancements, allowing Crusoe to stay ahead of the curve and leverage the latest innovations.
  • Strategic Location: San Francisco provides easy access to key data centers and customer locations.

Career Path

The Incident Manager role at Crusoe offers a clear path for career growth and advancement. You could potentially move into roles such as:

  • Senior Incident Manager: Lead a team of Incident Managers and oversee the company's overall incident management strategy.
  • Principal Engineer, Reliability: Focus on proactively improving system reliability and resilience through data analysis, automation, and process optimization.
  • Director of Operations: Oversee all aspects of the company's operations, ensuring smooth and efficient service delivery.

Salary and Benefits

The estimated salary range for an Incident Manager in San Francisco is $130,000 - $180,000 per year. Crusoe offers a comprehensive benefits package, including:

  • Competitive salary and performance-based bonuses
  • Comprehensive health, dental, and vision insurance
  • Paid time off and holidays
  • 401(k) plan with company match
  • Stock options
  • Professional development opportunities
  • A supportive and collaborative work environment

Crusoe Culture

At Crusoe, we are driven by a shared passion for innovation and a commitment to sustainability. We foster a culture of collaboration, respect, and continuous learning. We believe in empowering our employees to make a real impact on the world.

How to Apply

If you are a highly motivated and experienced Incident Manager with a passion for technology and a desire to make a difference, we encourage you to apply. Please submit your resume and cover letter through our online application portal.

Frequently Asked Questions (FAQ)

  1. What is Crusoe's mission?

    Crusoe's mission is to accelerate the abundance of energy and intelligence.

  2. What technologies will I be working with?

    You will be working with Linux, Virtualization, Kubernetes, TCP/IP, and Infrastructure-as-Code (IaC).

  3. What certifications are preferred?

    NVIDIA, Linux, and Kubernetes certifications are strongly preferred.

  4. What is the work environment like at Crusoe?

    Crusoe fosters a culture of collaboration, respect, and continuous learning.

  5. What are the career growth opportunities?

    You could potentially move into roles such as Senior Incident Manager, Principal Engineer, Reliability, or Director of Operations.

  6. What is the salary range for this position?

    The estimated salary range is $130,000 - $180,000 per year.

  7. What benefits does Crusoe offer?

    Crusoe offers a comprehensive benefits package, including health, dental, and vision insurance, paid time off, a 401(k) plan, and stock options.

  8. What kind of experience is Crusoe looking for?

    Crusoe is looking for someone with 4-5 years of customer-facing experience and 3-5+ years of experience in a team leadership role.

  9. Is programming experience required?

    Programming skills are a bonus.

  10. How do I apply for this position?

    Please submit your resume and cover letter through our online application portal.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

143,000 - 198,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Incident Managementintermediate
  • Crisis Managementintermediate
  • Data Analyticsintermediate
  • System Resiliencyintermediate
  • Linuxintermediate
  • Virtualizationintermediate
  • Kubernetesintermediate
  • TCP/IPintermediate
  • Infrastructure-as-Code (IaC)intermediate
  • Troubleshootingintermediate
  • Customer Supportintermediate
  • Technical Documentationintermediate
  • Communication Skillsintermediate
  • Problem Solvingintermediate
  • Team Leadershipintermediate
  • Infinibandintermediate
  • Containerizationintermediate
  • Distributed Trainingintermediate
  • Networkingintermediate
  • Customer Escalation Managementintermediate

Required Qualifications

  • 4-5 years of customer-facing experience (experience)
  • 3-5+ years’ experience in a team leadership role (experience)
  • Strong technical experience with Linux, Virtualization, and Kubernetes (experience)
  • Solid understanding of the TCP/IP stack (experience)
  • Experience with Infrastructure-as-Code (IaC) practices (experience)
  • Experience in diagnosing and resolving complex technical issues (experience)
  • Experience in developing and delivering training materials (experience)
  • Experience in incident response and management (experience)
  • Experience in using data analytics to drive system resiliency (experience)
  • Ability to lead incident responses for high-visibility issues (experience)
  • Strong communication and interpersonal skills (experience)
  • Ability to work closely with internal engineering and product teams (experience)
  • NVIDIA, Linux, and Kubernetes certifications are strongly preferred (experience)
  • Programming skills with one or more programming languages (bonus) (experience)
  • Proven ability to maintain customer trust during outages (experience)
  • Experience with Infiniband, containerization, and distributed training (experience)

Responsibilities

  • Lead incident responses for high-visibility issues, ensuring minimal disruption to customer operations.
  • Manage communication and strategy to maintain customer trust during outages or critical failures.
  • Utilize data analytics to identify trends in incidents, translating these insights into actionable strategies for greater system resiliency and reliability.
  • Develop robust incident response strategies and designs.
  • Conduct deep post-incident reviews to ensure root causes are addressed and recurrences are eliminated.
  • Diagnose and resolve complex technical issues related to Infiniband, containerization, and distributed training.
  • Guide and assist customers in implementing and optimizing their HPC infrastructure to achieve maximum performance and efficiency.
  • Develop and deliver training materials, including internal training sessions, documentation, and knowledge base articles, to empower customers to effectively utilize our solutions.
  • Work closely with internal engineering and product teams to provide valuable customer feedback.
  • Act as a key technical resource, helping our Customer Support Engineers (CSEs) and Customer Success Managers (CSMs) understand and resolve complex product issues.
  • Spearhead the management of high-visibility incidents and customer escalations.
  • Ensure rapid and effective responses to complex technical challenges.

Benefits

  • general: Competitive salary and benefits package
  • general: Opportunity to work on cutting-edge technology in the AI and cloud infrastructure space
  • general: Be a part of a company that is committed to sustainability and responsible innovation
  • general: Opportunity to drive meaningful innovation and make a tangible impact
  • general: Join a team that is setting the pace for responsible, transformative cloud infrastructure
  • general: Professional development and training opportunities
  • general: Collaborative and supportive work environment
  • general: Opportunity to work with a diverse and talented team
  • general: Health insurance
  • general: Dental insurance
  • general: Vision insurance
  • general: Paid time off
  • general: Stock options
  • general: 401(k) plan

Target Your Resume for "Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Incident ManagementLinuxKubernetesNetworkingCloudSan FranciscoIncident ManagerCaliforniaCrusoeAIArtificial IntelligenceCloud InfrastructureIncident ResponseCrisis ManagementData AnalyticsSystem ReliabilityVirtualizationTCP/IPInfrastructure-as-CodeCustomer SupportTechnical SupportTroubleshootingHigh-Performance ComputingHPCCareerJobEmploymentNVIDIAContainerizationDistributed TrainingGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Incident Manager Careers at Crusoe - San Francisco, CA | Apply Now! @ Crusoe.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.