Resume and JobRESUME AND JOB
Crusoe logo

Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

full-timePosted: Oct 1, 2025

Job Description

Senior Software Engineer - Cloud Availability Platform Engineering (Observability)

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

Role Overview

As a Senior Software Engineer on the Cloud Availability Platform Engineering team, you will be at the forefront of building and operating Crusoe’s next-generation observability platform. This platform is critical for ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform. Your deep expertise in metrics, logging, and tracing will be instrumental in enabling engineers to understand the internal state of distributed systems.

A Day in the Life

On a typical day, you might:

  • Design and implement scalable observability systems across multi-datacenter Kubernetes environments.
  • Architect end-to-end telemetry pipelines, optimizing for ingestion, storage, querying, and visualization.
  • Extend monitoring and alerting capabilities using tools like Prometheus, Alertmanager, Thanos/Cortex, and Grafana.
  • Build scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks.
  • Implement and integrate distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) with service meshes, load balancers, and APIs.
  • Define and drive the adoption of SLOs, SLIs, and error budgets across various services and teams.
  • Automate the provisioning and scaling of observability infrastructure using Kubernetes, Terraform, and custom tooling (Go, Python).
  • Ensure the reliability and cost efficiency of telemetry pipelines, supporting high-volume workloads such as AI/ML, HPC clusters, and GPU infrastructure.
  • Embed security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls.
  • Partner with engineering teams to integrate observability into applications, services, and infrastructure.
  • Mentor engineers and contribute to shaping Crusoe’s observability strategy and technical roadmap.

Why San Francisco?

San Francisco is a global hub for technology and innovation, making it an ideal location for Crusoe's operations. The city offers a vibrant ecosystem of talent, resources, and opportunities, allowing Crusoe to attract top-tier engineers and stay at the cutting edge of cloud infrastructure development. Furthermore, San Francisco's commitment to sustainability aligns with Crusoe's mission to accelerate the abundance of energy and intelligence in an environmentally responsible manner. Being in San Francisco provides Crusoe with access to a diverse range of perspectives and ideas, fostering a culture of creativity and problem-solving.

Career Path

At Crusoe, you will have the opportunity to grow your career in several directions. You could advance to a Principal Engineer role, leading significant technical initiatives and mentoring other engineers. Alternatively, you could move into a management position, overseeing a team of engineers and driving the strategic direction of the Cloud Availability Platform. Crusoe also supports career development through training programs, conferences, and internal mobility opportunities, allowing you to expand your skills and explore different areas of interest within the company.

Salary & Benefits

The estimated salary range for this position in San Francisco is $180,000 to $250,000 per year. Crusoe also offers a comprehensive benefits package, including:

  • Comprehensive health, dental, and vision insurance
  • Generous paid time off and holidays
  • 401(k) plan with company match
  • Stock options or equity grants
  • Professional development opportunities
  • Employee assistance program
  • Flexible work arrangements
  • Commuter benefits
  • Wellness programs
  • Company-sponsored events and activities

Crusoe Culture

Crusoe is committed to fostering a culture of innovation, collaboration, and sustainability. The company values its employees and provides a supportive and inclusive work environment. Crusoe encourages open communication, teamwork, and a passion for solving complex challenges. Employees are empowered to make a meaningful impact and contribute to the company's mission of accelerating the abundance of energy and intelligence in a responsible manner.

How to Apply

Interested candidates are encouraged to apply through the Crusoe careers page. Please submit your resume and a cover letter highlighting your relevant experience and qualifications. Be sure to emphasize your expertise in observability, Kubernetes, and distributed systems.

FAQ

  1. What is Crusoe's mission?

    Crusoe's mission is to accelerate the abundance of energy and intelligence.

  2. What technologies will I be working with?

    You will be working with technologies such as Kubernetes, Prometheus, Grafana, OpenTelemetry, Fluent Bit, Vector, Loki, ELK/Opensearch, Go, Python, and Terraform.

  3. What are the key responsibilities of this role?

    Key responsibilities include designing and operating scalable observability systems, architecting telemetry pipelines, and partnering with engineering teams to embed observability into applications.

  4. What qualifications are required for this role?

    Qualifications include 7+ years of experience in infrastructure or platform engineering, deep expertise in observability systems, and strong programming skills in Go or Python.

  5. What is the career path for this role?

    You can advance to a Principal Engineer role or move into a management position within the Cloud Availability Platform team.

  6. What is the salary range for this position?

    The estimated salary range for this position in San Francisco is $180,000 to $250,000 per year.

  7. What benefits does Crusoe offer?

    Crusoe offers a comprehensive benefits package, including health, dental, and vision insurance, paid time off, 401(k) plan, and stock options.

  8. What is the company culture like at Crusoe?

    Crusoe fosters a culture of innovation, collaboration, and sustainability.

  9. How does Crusoe contribute to sustainability?

    Crusoe is committed to accelerating the abundance of energy and intelligence in an environmentally responsible manner.

  10. What is the impact of this role on Crusoe's mission?

    This role is critical for ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform, enabling the company to deliver sustainable technology solutions.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

198,000 - 275,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Observabilityintermediate
  • Metrics Systemsintermediate
  • Prometheusintermediate
  • Thanosintermediate
  • Mimirintermediate
  • Cortexintermediate
  • Logging Pipelinesintermediate
  • Fluent Bitintermediate
  • Vectorintermediate
  • Lokiintermediate
  • ELKintermediate
  • Opensearchintermediate
  • Tracing Platformsintermediate
  • Jaegerintermediate
  • Tempointermediate
  • OpenTelemetryintermediate
  • Kubernetesintermediate
  • Terraformintermediate
  • Gointermediate
  • Pythonintermediate
  • Distributed Systemsintermediate
  • Performance Engineeringintermediate
  • Debuggingintermediate
  • Telemetry Pipelinesintermediate
  • Security Best Practicesintermediate
  • RBACintermediate
  • TLSintermediate
  • Secret Managementintermediate

Required Qualifications

  • 7+ years of experience in infrastructure or platform engineering (experience)
  • Experience with observability and monitoring systems (experience)
  • Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex) (experience)
  • Experience with logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch) (experience)
  • Experience with tracing platforms (Jaeger, Tempo, OpenTelemetry) (experience)
  • Strong programming skills in Go or Python (experience)
  • Experience running observability platforms on Kubernetes (experience)
  • Experience operating observability platforms at scale across multi-datacenter environments (experience)
  • Proven ability to design, optimize, and scale telemetry pipelines (experience)
  • Solid understanding of distributed systems (experience)
  • Knowledge of performance engineering (experience)
  • Ability to debug complex workloads (experience)
  • Strong collaboration skills (experience)
  • Ability to influence engineering teams to adopt observability best practices (experience)

Responsibilities

  • Designing and operating scalable observability systems across multi-datacenter Kubernetes environments
  • Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
  • Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
  • Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
  • Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry)
  • Integrating with service meshes, load balancers, and APIs
  • Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
  • Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
  • Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
  • Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
  • Partnering with engineering teams to embed observability into applications, services, and infrastructure
  • Mentoring engineers and shaping Crusoe’s observability strategy and technical roadmap

Benefits

  • general: Competitive salary
  • general: Comprehensive health, dental, and vision insurance
  • general: Generous paid time off and holidays
  • general: 401(k) plan with company match
  • general: Stock options or equity grants
  • general: Professional development opportunities
  • general: Employee assistance program
  • general: Flexible work arrangements
  • general: Commuter benefits
  • general: Wellness programs
  • general: Company-sponsored events and activities
  • general: Opportunity to work on cutting-edge technology
  • general: Collaborative and supportive work environment
  • general: Impactful work contributing to sustainable technology solutions
  • general: Career growth potential within a rapidly expanding company

Target Your Resume for "Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Software EngineeringCloud ComputingObservabilityKubernetesSan FranciscoFull-timeSenior Software EngineerCloud AvailabilityMonitoringPrometheusGrafanaOpenTelemetryFluent BitVectorLokiELKOpensearchGoPythonTerraformDistributed SystemsTelemetryCloud InfrastructureAI/MLHPCGPUCaliforniaCrusoe EnergySustainable TechnologyHigh-Throughput DataGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Crusoe logo

Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

full-timePosted: Oct 1, 2025

Job Description

Senior Software Engineer - Cloud Availability Platform Engineering (Observability)

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

Role Overview

As a Senior Software Engineer on the Cloud Availability Platform Engineering team, you will be at the forefront of building and operating Crusoe’s next-generation observability platform. This platform is critical for ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform. Your deep expertise in metrics, logging, and tracing will be instrumental in enabling engineers to understand the internal state of distributed systems.

A Day in the Life

On a typical day, you might:

  • Design and implement scalable observability systems across multi-datacenter Kubernetes environments.
  • Architect end-to-end telemetry pipelines, optimizing for ingestion, storage, querying, and visualization.
  • Extend monitoring and alerting capabilities using tools like Prometheus, Alertmanager, Thanos/Cortex, and Grafana.
  • Build scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks.
  • Implement and integrate distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) with service meshes, load balancers, and APIs.
  • Define and drive the adoption of SLOs, SLIs, and error budgets across various services and teams.
  • Automate the provisioning and scaling of observability infrastructure using Kubernetes, Terraform, and custom tooling (Go, Python).
  • Ensure the reliability and cost efficiency of telemetry pipelines, supporting high-volume workloads such as AI/ML, HPC clusters, and GPU infrastructure.
  • Embed security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls.
  • Partner with engineering teams to integrate observability into applications, services, and infrastructure.
  • Mentor engineers and contribute to shaping Crusoe’s observability strategy and technical roadmap.

Why San Francisco?

San Francisco is a global hub for technology and innovation, making it an ideal location for Crusoe's operations. The city offers a vibrant ecosystem of talent, resources, and opportunities, allowing Crusoe to attract top-tier engineers and stay at the cutting edge of cloud infrastructure development. Furthermore, San Francisco's commitment to sustainability aligns with Crusoe's mission to accelerate the abundance of energy and intelligence in an environmentally responsible manner. Being in San Francisco provides Crusoe with access to a diverse range of perspectives and ideas, fostering a culture of creativity and problem-solving.

Career Path

At Crusoe, you will have the opportunity to grow your career in several directions. You could advance to a Principal Engineer role, leading significant technical initiatives and mentoring other engineers. Alternatively, you could move into a management position, overseeing a team of engineers and driving the strategic direction of the Cloud Availability Platform. Crusoe also supports career development through training programs, conferences, and internal mobility opportunities, allowing you to expand your skills and explore different areas of interest within the company.

Salary & Benefits

The estimated salary range for this position in San Francisco is $180,000 to $250,000 per year. Crusoe also offers a comprehensive benefits package, including:

  • Comprehensive health, dental, and vision insurance
  • Generous paid time off and holidays
  • 401(k) plan with company match
  • Stock options or equity grants
  • Professional development opportunities
  • Employee assistance program
  • Flexible work arrangements
  • Commuter benefits
  • Wellness programs
  • Company-sponsored events and activities

Crusoe Culture

Crusoe is committed to fostering a culture of innovation, collaboration, and sustainability. The company values its employees and provides a supportive and inclusive work environment. Crusoe encourages open communication, teamwork, and a passion for solving complex challenges. Employees are empowered to make a meaningful impact and contribute to the company's mission of accelerating the abundance of energy and intelligence in a responsible manner.

How to Apply

Interested candidates are encouraged to apply through the Crusoe careers page. Please submit your resume and a cover letter highlighting your relevant experience and qualifications. Be sure to emphasize your expertise in observability, Kubernetes, and distributed systems.

FAQ

  1. What is Crusoe's mission?

    Crusoe's mission is to accelerate the abundance of energy and intelligence.

  2. What technologies will I be working with?

    You will be working with technologies such as Kubernetes, Prometheus, Grafana, OpenTelemetry, Fluent Bit, Vector, Loki, ELK/Opensearch, Go, Python, and Terraform.

  3. What are the key responsibilities of this role?

    Key responsibilities include designing and operating scalable observability systems, architecting telemetry pipelines, and partnering with engineering teams to embed observability into applications.

  4. What qualifications are required for this role?

    Qualifications include 7+ years of experience in infrastructure or platform engineering, deep expertise in observability systems, and strong programming skills in Go or Python.

  5. What is the career path for this role?

    You can advance to a Principal Engineer role or move into a management position within the Cloud Availability Platform team.

  6. What is the salary range for this position?

    The estimated salary range for this position in San Francisco is $180,000 to $250,000 per year.

  7. What benefits does Crusoe offer?

    Crusoe offers a comprehensive benefits package, including health, dental, and vision insurance, paid time off, 401(k) plan, and stock options.

  8. What is the company culture like at Crusoe?

    Crusoe fosters a culture of innovation, collaboration, and sustainability.

  9. How does Crusoe contribute to sustainability?

    Crusoe is committed to accelerating the abundance of energy and intelligence in an environmentally responsible manner.

  10. What is the impact of this role on Crusoe's mission?

    This role is critical for ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform, enabling the company to deliver sustainable technology solutions.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

198,000 - 275,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Observabilityintermediate
  • Metrics Systemsintermediate
  • Prometheusintermediate
  • Thanosintermediate
  • Mimirintermediate
  • Cortexintermediate
  • Logging Pipelinesintermediate
  • Fluent Bitintermediate
  • Vectorintermediate
  • Lokiintermediate
  • ELKintermediate
  • Opensearchintermediate
  • Tracing Platformsintermediate
  • Jaegerintermediate
  • Tempointermediate
  • OpenTelemetryintermediate
  • Kubernetesintermediate
  • Terraformintermediate
  • Gointermediate
  • Pythonintermediate
  • Distributed Systemsintermediate
  • Performance Engineeringintermediate
  • Debuggingintermediate
  • Telemetry Pipelinesintermediate
  • Security Best Practicesintermediate
  • RBACintermediate
  • TLSintermediate
  • Secret Managementintermediate

Required Qualifications

  • 7+ years of experience in infrastructure or platform engineering (experience)
  • Experience with observability and monitoring systems (experience)
  • Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex) (experience)
  • Experience with logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch) (experience)
  • Experience with tracing platforms (Jaeger, Tempo, OpenTelemetry) (experience)
  • Strong programming skills in Go or Python (experience)
  • Experience running observability platforms on Kubernetes (experience)
  • Experience operating observability platforms at scale across multi-datacenter environments (experience)
  • Proven ability to design, optimize, and scale telemetry pipelines (experience)
  • Solid understanding of distributed systems (experience)
  • Knowledge of performance engineering (experience)
  • Ability to debug complex workloads (experience)
  • Strong collaboration skills (experience)
  • Ability to influence engineering teams to adopt observability best practices (experience)

Responsibilities

  • Designing and operating scalable observability systems across multi-datacenter Kubernetes environments
  • Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
  • Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
  • Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
  • Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry)
  • Integrating with service meshes, load balancers, and APIs
  • Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
  • Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
  • Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
  • Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
  • Partnering with engineering teams to embed observability into applications, services, and infrastructure
  • Mentoring engineers and shaping Crusoe’s observability strategy and technical roadmap

Benefits

  • general: Competitive salary
  • general: Comprehensive health, dental, and vision insurance
  • general: Generous paid time off and holidays
  • general: 401(k) plan with company match
  • general: Stock options or equity grants
  • general: Professional development opportunities
  • general: Employee assistance program
  • general: Flexible work arrangements
  • general: Commuter benefits
  • general: Wellness programs
  • general: Company-sponsored events and activities
  • general: Opportunity to work on cutting-edge technology
  • general: Collaborative and supportive work environment
  • general: Impactful work contributing to sustainable technology solutions
  • general: Career growth potential within a rapidly expanding company

Target Your Resume for "Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Software EngineeringCloud ComputingObservabilityKubernetesSan FranciscoFull-timeSenior Software EngineerCloud AvailabilityMonitoringPrometheusGrafanaOpenTelemetryFluent BitVectorLokiELKOpensearchGoPythonTerraformDistributed SystemsTelemetryCloud InfrastructureAI/MLHPCGPUCaliforniaCrusoe EnergySustainable TechnologyHigh-Throughput DataGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.