RESUME AND JOB

Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

full-timePosted: Dec 5, 2025

Job Description

Senior Site Reliability Engineer at Crusoe: Powering the AI Revolution

Crusoe Energy Systems is on a mission to accelerate the abundance of energy and intelligence. We are building the infrastructure to power a world where people can create ambitiously with AI, without sacrificing scale, speed, or sustainability. As a Senior Site Reliability Engineer (SRE) at Crusoe, you'll be at the heart of this mission, ensuring the reliability, performance, and efficiency of our AI-optimized cloud platform.

Role Overview

As a Senior SRE focused on Operational Excellence, you will play a critical role in maintaining the stability and resilience of Crusoe's GPU cloud. You'll work closely with senior SREs, infrastructure engineers, and platform teams to improve reliability, reduce operational toil, and strengthen our incident management practices. This role is perfect for engineers who thrive in a fast-paced environment, enjoy solving operational problems, and are passionate about building a career in a high-growth technology company.

A Day in the Life

Here’s a glimpse into what your day might look like:

Morning: Start your day by reviewing monitoring dashboards and alerts to identify any potential issues. Participate in a daily stand-up meeting with the SRE team to discuss ongoing projects and address any urgent concerns.
Mid-day: Collaborate with the network team to troubleshoot a network latency issue affecting GPU performance. Work on automating a new monitoring tool using Prometheus and Grafana to improve observability of the cloud infrastructure.
Afternoon: Participate in a post-incident review to analyze the root cause of a recent service disruption and identify preventative measures. Work on developing a disaster recovery plan for a critical application.
Evening: On-call responsibilities may require you to respond to alerts and address any urgent issues that arise outside of regular business hours. Contribute to documentation and knowledge-sharing initiatives to improve the team's collective understanding of the infrastructure.

Why San Francisco?

San Francisco is a global hub for technology and innovation, offering unparalleled opportunities for career growth and networking. Crusoe's San Francisco office places you in the heart of this vibrant ecosystem, surrounded by talented engineers and cutting-edge companies. In addition, San Francisco offers a diverse range of cultural attractions, outdoor activities, and world-class dining experiences.

Career Path

Crusoe is committed to fostering the professional growth of its employees. As a Senior SRE, you'll have opportunities to develop your technical skills, leadership abilities, and strategic thinking. Potential career paths include:

Principal SRE: Lead critical reliability initiatives and mentor junior SREs.
SRE Manager: Manage a team of SREs and oversee the reliability of a specific platform or service.
Infrastructure Architect: Design and implement the next generation of Crusoe's cloud infrastructure.

Salary & Benefits

Crusoe offers a competitive salary and benefits package, including:

Competitive salary range of $170,000 - $250,000 per year.
Equity options in a rapidly growing company.
Comprehensive health, dental, and vision insurance.
Generous paid time off and holiday schedule.
401(k) retirement plan with company match.
Professional development opportunities.
Opportunities for growth and advancement within the company.
A collaborative and supportive work environment.
The chance to work on cutting-edge technology.
Make a tangible impact on the future of AI and sustainable technology.
Company-sponsored events and team-building activities.
Wellness programs and resources.
Commuter benefits.
Flexible work arrangements
Life insurance
Disability insurance
Employee assistance program
Parental leave
Pet friendly office

Crusoe Culture

At Crusoe, we value innovation, collaboration, and a commitment to sustainability. We are a team of passionate individuals who are dedicated to building a better future. We foster a culture of continuous learning, open communication, and mutual respect. We believe that everyone has a voice and that diverse perspectives lead to better solutions.

How to Apply

If you are a passionate and talented SRE looking for a challenging and rewarding opportunity, we encourage you to apply! Please submit your resume and a cover letter highlighting your relevant experience and qualifications through our online application portal.

Frequently Asked Questions (FAQ)

What is Crusoe's mission? Crusoe's mission is to accelerate the abundance of energy and intelligence.
What does a Site Reliability Engineer do at Crusoe? An SRE at Crusoe ensures the reliability, performance, and efficiency of our AI-optimized cloud platform.
What skills are important for this role? Important skills include cloud operations, SRE principles, incident management, Linux systems administration, and cloud platform knowledge.
What experience is required for this role? We require 5+ years of experience in cloud operations, SRE, or related roles.
What tools and technologies do SREs use at Crusoe? SREs use tools like Prometheus, Grafana, Terraform, Ansible, and scripting languages like Go and Python.
What is the career path for an SRE at Crusoe? Potential career paths include Principal SRE, SRE Manager, and Infrastructure Architect.
What benefits does Crusoe offer? Crusoe offers a competitive salary, equity options, comprehensive health insurance, and a generous paid time off policy.
What is the company culture like at Crusoe? Crusoe values innovation, collaboration, and a commitment to sustainability.
How does Crusoe contribute to sustainability? Crusoe utilizes stranded energy resources to power its cloud infrastructure, reducing waste and minimizing environmental impact.
What opportunities for professional development are available at Crusoe? Crusoe provides opportunities for training, mentorship, and participation in industry conferences.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

187,000 - 275,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Site Reliability Engineering (SRE)intermediate
Cloud Operationsintermediate
Incident Managementintermediate
Problem Solvingintermediate
Linux Systems Administrationintermediate
Cloud Platforms (Kubernetes, AWS, GCP)intermediate
Monitoring and Alerting (Prometheus, Grafana)intermediate
Automationintermediate
Terraformintermediate
Ansibleintermediate
Scripting (Go, Python, C, C++)intermediate
Communicationintermediate
Collaborationintermediate
Operational Excellenceintermediate
Disaster Recoveryintermediate
Performance Bottleneck Identificationintermediate
Unix Systems Administrationintermediate
Network Troubleshootingintermediate

Required Qualifications

5+ years of experience in cloud operations, SRE, or related roles (experience)
Experience working with GPU workloads, high-performance computing, or latency/throughput-sensitive systems (experience)
Strong knowledge of Unix/Linux systems (kernel/user space) and networking (experience)
Understanding of cloud platforms and infrastructure fundamentals (Kubernetes, AWS/GCP, virtualization, distributed systems) (experience)
Familiarity with incident management practices and operational frameworks (SRE/ITIL/etc.) (experience)
Experience with monitoring and alerting tools (Prometheus, Grafana) or a strong willingness to learn (experience)
Familiarity with infrastructure-as-code and configuration management tools such as Terraform and Ansible (experience)
Basic scripting and automation experience (Go, Python, C, C++) (experience)
Strong communication skills (experience)
Ability to stay calm, focused, and effective in fast-moving or high-pressure situations (experience)
A growth mindset with enthusiasm for operational excellence (experience)
Bachelor's degree in Computer Science, Engineering, or a related field (experience)

Responsibilities

Collaborate with cross-functional teams to define and refine availability metrics for Crusoe’s cloud infrastructure.
Establish, track, and improve Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Assist in incident response by identifying, diagnosing, and resolving service disruptions.
Support post-incident processes through Root Cause Analysis (RCA) documentation.
Participate in post-incident reviews.
Build, operate, and monitor infrastructure health using Crusoe’s observability stack (Prometheus, Grafana, Alertmanager, OpenTelemetry).
Identify and communicate reliability risks and performance bottlenecks.
Identify early indicators of potential incidents that could impact service availability.
Develop automation and tooling to reduce operational toil.
Minimize manual intervention and enhance service recovery and self-healing capabilities.
Partner with compute, network, storage, and platform teams to improve service resilience.
Strengthen disaster recovery readiness.
Contribute to knowledge sharing and process improvements.
Develop operational best practices across the organization.
Participate in ongoing training, mentorship, and professional development.

Benefits

general: Competitive salary and equity options.
general: Comprehensive health, dental, and vision insurance.
general: Generous paid time off and holiday schedule.
general: 401(k) retirement plan with company match.
general: Professional development opportunities.
general: Opportunities for growth and advancement within the company.
general: A collaborative and supportive work environment.
general: The chance to work on cutting-edge technology.
general: Make a tangible impact on the future of AI and sustainable technology.
general: Company-sponsored events and team-building activities.
general: Wellness programs and resources.
general: Commuter benefits.
general: Flexible work arrangements
general: Life insurance
general: Disability insurance
general: Employee assistance program
general: Parental leave
general: Pet friendly office

Target Your Resume for "Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

SRECloudSan FranciscoGPUAIKubernetesSite Reliability EngineerCloud OperationsCaliforniaGPU CloudAI InfrastructureAWSGCPPrometheusGrafanaTerraformAnsibleLinuxIncident ManagementDisaster RecoveryAutomationMonitoringAlertingHigh Performance ComputingStranded EnergySustainable TechnologyCareerCrusoe EnergyGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

full-timePosted: Dec 5, 2025

Job Description

Senior Site Reliability Engineer at Crusoe: Powering the AI Revolution

Role Overview

A Day in the Life

Here’s a glimpse into what your day might look like:

Morning: Start your day by reviewing monitoring dashboards and alerts to identify any potential issues. Participate in a daily stand-up meeting with the SRE team to discuss ongoing projects and address any urgent concerns.
Mid-day: Collaborate with the network team to troubleshoot a network latency issue affecting GPU performance. Work on automating a new monitoring tool using Prometheus and Grafana to improve observability of the cloud infrastructure.
Afternoon: Participate in a post-incident review to analyze the root cause of a recent service disruption and identify preventative measures. Work on developing a disaster recovery plan for a critical application.
Evening: On-call responsibilities may require you to respond to alerts and address any urgent issues that arise outside of regular business hours. Contribute to documentation and knowledge-sharing initiatives to improve the team's collective understanding of the infrastructure.

Why San Francisco?

Career Path

Principal SRE: Lead critical reliability initiatives and mentor junior SREs.
SRE Manager: Manage a team of SREs and oversee the reliability of a specific platform or service.
Infrastructure Architect: Design and implement the next generation of Crusoe's cloud infrastructure.

Salary & Benefits

Crusoe offers a competitive salary and benefits package, including:

Competitive salary range of $170,000 - $250,000 per year.
Equity options in a rapidly growing company.
Comprehensive health, dental, and vision insurance.
Generous paid time off and holiday schedule.
401(k) retirement plan with company match.
Professional development opportunities.
Opportunities for growth and advancement within the company.
A collaborative and supportive work environment.
The chance to work on cutting-edge technology.
Make a tangible impact on the future of AI and sustainable technology.
Company-sponsored events and team-building activities.
Wellness programs and resources.
Commuter benefits.
Flexible work arrangements
Life insurance
Disability insurance
Employee assistance program
Parental leave
Pet friendly office

Crusoe Culture

How to Apply

Frequently Asked Questions (FAQ)

What is Crusoe's mission? Crusoe's mission is to accelerate the abundance of energy and intelligence.
What does a Site Reliability Engineer do at Crusoe? An SRE at Crusoe ensures the reliability, performance, and efficiency of our AI-optimized cloud platform.
What skills are important for this role? Important skills include cloud operations, SRE principles, incident management, Linux systems administration, and cloud platform knowledge.
What experience is required for this role? We require 5+ years of experience in cloud operations, SRE, or related roles.
What tools and technologies do SREs use at Crusoe? SREs use tools like Prometheus, Grafana, Terraform, Ansible, and scripting languages like Go and Python.
What is the career path for an SRE at Crusoe? Potential career paths include Principal SRE, SRE Manager, and Infrastructure Architect.
What benefits does Crusoe offer? Crusoe offers a competitive salary, equity options, comprehensive health insurance, and a generous paid time off policy.
What is the company culture like at Crusoe? Crusoe values innovation, collaboration, and a commitment to sustainability.
How does Crusoe contribute to sustainability? Crusoe utilizes stranded energy resources to power its cloud infrastructure, reducing waste and minimizing environmental impact.
What opportunities for professional development are available at Crusoe? Crusoe provides opportunities for training, mentorship, and participation in industry conferences.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

187,000 - 275,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Site Reliability Engineering (SRE)intermediate
Cloud Operationsintermediate
Incident Managementintermediate
Problem Solvingintermediate
Linux Systems Administrationintermediate
Cloud Platforms (Kubernetes, AWS, GCP)intermediate
Monitoring and Alerting (Prometheus, Grafana)intermediate
Automationintermediate
Terraformintermediate
Ansibleintermediate
Scripting (Go, Python, C, C++)intermediate
Communicationintermediate
Collaborationintermediate
Operational Excellenceintermediate
Disaster Recoveryintermediate
Performance Bottleneck Identificationintermediate
Unix Systems Administrationintermediate
Network Troubleshootingintermediate

Required Qualifications

5+ years of experience in cloud operations, SRE, or related roles (experience)
Experience working with GPU workloads, high-performance computing, or latency/throughput-sensitive systems (experience)
Strong knowledge of Unix/Linux systems (kernel/user space) and networking (experience)
Understanding of cloud platforms and infrastructure fundamentals (Kubernetes, AWS/GCP, virtualization, distributed systems) (experience)
Familiarity with incident management practices and operational frameworks (SRE/ITIL/etc.) (experience)
Experience with monitoring and alerting tools (Prometheus, Grafana) or a strong willingness to learn (experience)
Familiarity with infrastructure-as-code and configuration management tools such as Terraform and Ansible (experience)
Basic scripting and automation experience (Go, Python, C, C++) (experience)
Strong communication skills (experience)
Ability to stay calm, focused, and effective in fast-moving or high-pressure situations (experience)
A growth mindset with enthusiasm for operational excellence (experience)
Bachelor's degree in Computer Science, Engineering, or a related field (experience)

Responsibilities

Collaborate with cross-functional teams to define and refine availability metrics for Crusoe’s cloud infrastructure.
Establish, track, and improve Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Assist in incident response by identifying, diagnosing, and resolving service disruptions.
Support post-incident processes through Root Cause Analysis (RCA) documentation.
Participate in post-incident reviews.
Build, operate, and monitor infrastructure health using Crusoe’s observability stack (Prometheus, Grafana, Alertmanager, OpenTelemetry).
Identify and communicate reliability risks and performance bottlenecks.
Identify early indicators of potential incidents that could impact service availability.
Develop automation and tooling to reduce operational toil.
Minimize manual intervention and enhance service recovery and self-healing capabilities.
Partner with compute, network, storage, and platform teams to improve service resilience.
Strengthen disaster recovery readiness.
Contribute to knowledge sharing and process improvements.
Develop operational best practices across the organization.
Participate in ongoing training, mentorship, and professional development.

Benefits

general: Competitive salary and equity options.
general: Comprehensive health, dental, and vision insurance.
general: Generous paid time off and holiday schedule.
general: 401(k) retirement plan with company match.
general: Professional development opportunities.
general: Opportunities for growth and advancement within the company.
general: A collaborative and supportive work environment.
general: The chance to work on cutting-edge technology.
general: Make a tangible impact on the future of AI and sustainable technology.
general: Company-sponsored events and team-building activities.
general: Wellness programs and resources.
general: Commuter benefits.
general: Flexible work arrangements
general: Life insurance
general: Disability insurance
general: Employee assistance program
general: Parental leave
general: Pet friendly office

Target Your Resume for "Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Senior Site Reliability Engineer Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap