RESUME AND JOB

Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

full-timePosted: Oct 1, 2025

Job Description

Senior Software Engineer - Cloud Availability Platform Engineering (Observability)

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.

Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.

Role Overview

As a Senior Software Engineer on the Cloud Availability Platform Engineering team, you will be at the forefront of building and operating Crusoe’s next-generation observability platform. This platform is critical for ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform. Your deep expertise in metrics, logging, and tracing will be instrumental in enabling engineers to understand the internal state of distributed systems.

A Day in the Life

On a typical day, you might:

Design and implement scalable observability systems across multi-datacenter Kubernetes environments.
Architect end-to-end telemetry pipelines, optimizing for ingestion, storage, querying, and visualization.
Extend monitoring and alerting capabilities using tools like Prometheus, Alertmanager, Thanos/Cortex, and Grafana.
Build scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks.
Implement and integrate distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) with service meshes, load balancers, and APIs.
Define and drive the adoption of SLOs, SLIs, and error budgets across various services and teams.
Automate the provisioning and scaling of observability infrastructure using Kubernetes, Terraform, and custom tooling (Go, Python).
Ensure the reliability and cost efficiency of telemetry pipelines, supporting high-volume workloads such as AI/ML, HPC clusters, and GPU infrastructure.
Embed security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls.
Partner with engineering teams to integrate observability into applications, services, and infrastructure.
Mentor engineers and contribute to shaping Crusoe’s observability strategy and technical roadmap.

Why San Francisco?

San Francisco is a global hub for technology and innovation, making it an ideal location for Crusoe's operations. The city offers a vibrant ecosystem of talent, resources, and opportunities, allowing Crusoe to attract top-tier engineers and stay at the cutting edge of cloud infrastructure development. Furthermore, San Francisco's commitment to sustainability aligns with Crusoe's mission to accelerate the abundance of energy and intelligence in an environmentally responsible manner. Being in San Francisco provides Crusoe with access to a diverse range of perspectives and ideas, fostering a culture of creativity and problem-solving.

Career Path

At Crusoe, you will have the opportunity to grow your career in several directions. You could advance to a Principal Engineer role, leading significant technical initiatives and mentoring other engineers. Alternatively, you could move into a management position, overseeing a team of engineers and driving the strategic direction of the Cloud Availability Platform. Crusoe also supports career development through training programs, conferences, and internal mobility opportunities, allowing you to expand your skills and explore different areas of interest within the company.

Salary & Benefits

The estimated salary range for this position in San Francisco is $180,000 to $250,000 per year. Crusoe also offers a comprehensive benefits package, including:

Comprehensive health, dental, and vision insurance
Generous paid time off and holidays
401(k) plan with company match
Stock options or equity grants
Professional development opportunities
Employee assistance program
Flexible work arrangements
Commuter benefits
Wellness programs
Company-sponsored events and activities

Crusoe Culture

Crusoe is committed to fostering a culture of innovation, collaboration, and sustainability. The company values its employees and provides a supportive and inclusive work environment. Crusoe encourages open communication, teamwork, and a passion for solving complex challenges. Employees are empowered to make a meaningful impact and contribute to the company's mission of accelerating the abundance of energy and intelligence in a responsible manner.

How to Apply

Interested candidates are encouraged to apply through the Crusoe careers page. Please submit your resume and a cover letter highlighting your relevant experience and qualifications. Be sure to emphasize your expertise in observability, Kubernetes, and distributed systems.

FAQ

What is Crusoe's mission?

Crusoe's mission is to accelerate the abundance of energy and intelligence.
What technologies will I be working with?

You will be working with technologies such as Kubernetes, Prometheus, Grafana, OpenTelemetry, Fluent Bit, Vector, Loki, ELK/Opensearch, Go, Python, and Terraform.
What are the key responsibilities of this role?

Key responsibilities include designing and operating scalable observability systems, architecting telemetry pipelines, and partnering with engineering teams to embed observability into applications.
What qualifications are required for this role?

Qualifications include 7+ years of experience in infrastructure or platform engineering, deep expertise in observability systems, and strong programming skills in Go or Python.
What is the career path for this role?

You can advance to a Principal Engineer role or move into a management position within the Cloud Availability Platform team.
What is the salary range for this position?

The estimated salary range for this position in San Francisco is $180,000 to $250,000 per year.
What benefits does Crusoe offer?

Crusoe offers a comprehensive benefits package, including health, dental, and vision insurance, paid time off, 401(k) plan, and stock options.
What is the company culture like at Crusoe?

Crusoe fosters a culture of innovation, collaboration, and sustainability.
How does Crusoe contribute to sustainability?

Crusoe is committed to accelerating the abundance of energy and intelligence in an environmentally responsible manner.
What is the impact of this role on Crusoe's mission?

This role is critical for ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform, enabling the company to deliver sustainable technology solutions.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

198,000 - 275,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Observabilityintermediate
Metrics Systemsintermediate
Prometheusintermediate
Thanosintermediate
Mimirintermediate
Cortexintermediate
Logging Pipelinesintermediate
Fluent Bitintermediate
Vectorintermediate
Lokiintermediate
ELKintermediate
Opensearchintermediate
Tracing Platformsintermediate
Jaegerintermediate
Tempointermediate
OpenTelemetryintermediate
Kubernetesintermediate
Terraformintermediate
Gointermediate
Pythonintermediate
Distributed Systemsintermediate
Performance Engineeringintermediate
Debuggingintermediate
Telemetry Pipelinesintermediate
Security Best Practicesintermediate
RBACintermediate
TLSintermediate
Secret Managementintermediate

Required Qualifications

7+ years of experience in infrastructure or platform engineering (experience)
Experience with observability and monitoring systems (experience)
Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex) (experience)
Experience with logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch) (experience)
Experience with tracing platforms (Jaeger, Tempo, OpenTelemetry) (experience)
Strong programming skills in Go or Python (experience)
Experience running observability platforms on Kubernetes (experience)
Experience operating observability platforms at scale across multi-datacenter environments (experience)
Proven ability to design, optimize, and scale telemetry pipelines (experience)
Solid understanding of distributed systems (experience)
Knowledge of performance engineering (experience)
Ability to debug complex workloads (experience)
Strong collaboration skills (experience)
Ability to influence engineering teams to adopt observability best practices (experience)

Responsibilities

Designing and operating scalable observability systems across multi-datacenter Kubernetes environments
Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry)
Integrating with service meshes, load balancers, and APIs
Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
Partnering with engineering teams to embed observability into applications, services, and infrastructure
Mentoring engineers and shaping Crusoe’s observability strategy and technical roadmap

Benefits

general: Competitive salary
general: Comprehensive health, dental, and vision insurance
general: Generous paid time off and holidays
general: 401(k) plan with company match
general: Stock options or equity grants
general: Professional development opportunities
general: Employee assistance program
general: Flexible work arrangements
general: Commuter benefits
general: Wellness programs
general: Company-sponsored events and activities
general: Opportunity to work on cutting-edge technology
general: Collaborative and supportive work environment
general: Impactful work contributing to sustainable technology solutions
general: Career growth potential within a rapidly expanding company

Target Your Resume for "Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Software EngineeringCloud ComputingObservabilityKubernetesSan FranciscoFull-timeSenior Software EngineerCloud AvailabilityMonitoringPrometheusGrafanaOpenTelemetryFluent BitVectorLokiELKOpensearchGoPythonTerraformDistributed SystemsTelemetryCloud InfrastructureAI/MLHPCGPUCaliforniaCrusoe EnergySustainable TechnologyHigh-Throughput DataGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

full-timePosted: Oct 1, 2025

Job Description

Senior Software Engineer - Cloud Availability Platform Engineering (Observability)

Role Overview

A Day in the Life

On a typical day, you might:

Design and implement scalable observability systems across multi-datacenter Kubernetes environments.
Architect end-to-end telemetry pipelines, optimizing for ingestion, storage, querying, and visualization.
Extend monitoring and alerting capabilities using tools like Prometheus, Alertmanager, Thanos/Cortex, and Grafana.
Build scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks.
Implement and integrate distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) with service meshes, load balancers, and APIs.
Define and drive the adoption of SLOs, SLIs, and error budgets across various services and teams.
Automate the provisioning and scaling of observability infrastructure using Kubernetes, Terraform, and custom tooling (Go, Python).
Ensure the reliability and cost efficiency of telemetry pipelines, supporting high-volume workloads such as AI/ML, HPC clusters, and GPU infrastructure.
Embed security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls.
Partner with engineering teams to integrate observability into applications, services, and infrastructure.
Mentor engineers and contribute to shaping Crusoe’s observability strategy and technical roadmap.

Why San Francisco?

Career Path

Salary & Benefits

The estimated salary range for this position in San Francisco is $180,000 to $250,000 per year. Crusoe also offers a comprehensive benefits package, including:

Comprehensive health, dental, and vision insurance
Generous paid time off and holidays
401(k) plan with company match
Stock options or equity grants
Professional development opportunities
Employee assistance program
Flexible work arrangements
Commuter benefits
Wellness programs
Company-sponsored events and activities

Crusoe Culture

How to Apply

FAQ

What is Crusoe's mission?

Crusoe's mission is to accelerate the abundance of energy and intelligence.
What technologies will I be working with?

You will be working with technologies such as Kubernetes, Prometheus, Grafana, OpenTelemetry, Fluent Bit, Vector, Loki, ELK/Opensearch, Go, Python, and Terraform.
What are the key responsibilities of this role?

Key responsibilities include designing and operating scalable observability systems, architecting telemetry pipelines, and partnering with engineering teams to embed observability into applications.
What qualifications are required for this role?

Qualifications include 7+ years of experience in infrastructure or platform engineering, deep expertise in observability systems, and strong programming skills in Go or Python.
What is the career path for this role?

You can advance to a Principal Engineer role or move into a management position within the Cloud Availability Platform team.
What is the salary range for this position?

The estimated salary range for this position in San Francisco is $180,000 to $250,000 per year.
What benefits does Crusoe offer?

Crusoe offers a comprehensive benefits package, including health, dental, and vision insurance, paid time off, 401(k) plan, and stock options.
What is the company culture like at Crusoe?

Crusoe fosters a culture of innovation, collaboration, and sustainability.
How does Crusoe contribute to sustainability?

Crusoe is committed to accelerating the abundance of energy and intelligence in an environmentally responsible manner.
What is the impact of this role on Crusoe's mission?

This role is critical for ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform, enabling the company to deliver sustainable technology solutions.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

198,000 - 275,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Observabilityintermediate
Metrics Systemsintermediate
Prometheusintermediate
Thanosintermediate
Mimirintermediate
Cortexintermediate
Logging Pipelinesintermediate
Fluent Bitintermediate
Vectorintermediate
Lokiintermediate
ELKintermediate
Opensearchintermediate
Tracing Platformsintermediate
Jaegerintermediate
Tempointermediate
OpenTelemetryintermediate
Kubernetesintermediate
Terraformintermediate
Gointermediate
Pythonintermediate
Distributed Systemsintermediate
Performance Engineeringintermediate
Debuggingintermediate
Telemetry Pipelinesintermediate
Security Best Practicesintermediate
RBACintermediate
TLSintermediate
Secret Managementintermediate

Required Qualifications

7+ years of experience in infrastructure or platform engineering (experience)
Experience with observability and monitoring systems (experience)
Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex) (experience)
Experience with logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch) (experience)
Experience with tracing platforms (Jaeger, Tempo, OpenTelemetry) (experience)
Strong programming skills in Go or Python (experience)
Experience running observability platforms on Kubernetes (experience)
Experience operating observability platforms at scale across multi-datacenter environments (experience)
Proven ability to design, optimize, and scale telemetry pipelines (experience)
Solid understanding of distributed systems (experience)
Knowledge of performance engineering (experience)
Ability to debug complex workloads (experience)
Strong collaboration skills (experience)
Ability to influence engineering teams to adopt observability best practices (experience)

Responsibilities

Designing and operating scalable observability systems across multi-datacenter Kubernetes environments
Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry)
Integrating with service meshes, load balancers, and APIs
Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
Partnering with engineering teams to embed observability into applications, services, and infrastructure
Mentoring engineers and shaping Crusoe’s observability strategy and technical roadmap

Benefits

general: Competitive salary
general: Comprehensive health, dental, and vision insurance
general: Generous paid time off and holidays
general: 401(k) plan with company match
general: Stock options or equity grants
general: Professional development opportunities
general: Employee assistance program
general: Flexible work arrangements
general: Commuter benefits
general: Wellness programs
general: Company-sponsored events and activities
general: Opportunity to work on cutting-edge technology
general: Collaborative and supportive work environment
general: Impactful work contributing to sustainable technology solutions
general: Career growth potential within a rapidly expanding company

Target Your Resume for "Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Senior Software Engineer - Cloud Availability (Observability) Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap