RESUME AND JOB

Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

full-timePosted: Oct 1, 2025

Job Description

Senior Software Engineer - Cloud Availability Platform (Observability)

Role Overview

Crusoe is seeking a highly skilled and experienced Senior Software Engineer to join our Cloud Availability Platform Engineering team. In this role, you will be responsible for designing, developing, and operating Crusoe’s next-generation observability stack. Your work will be critical in ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform, enabling engineers to understand the internal state of distributed systems through metrics, logs, and traces. As a key member of the team, you'll contribute to the AI revolution with sustainable technology, driving meaningful innovation and setting the pace for responsible, transformative cloud infrastructure.

Day in the Life

A typical day for a Senior Software Engineer on the Cloud Availability Platform Engineering team might include:

Designing and implementing scalable observability solutions using technologies like Prometheus, Grafana, Loki, and OpenTelemetry.
Collaborating with other engineering teams to integrate observability into their applications and services.
Troubleshooting and resolving issues related to the observability platform.
Automating the provisioning and scaling of observability infrastructure.
Defining and driving adoption of SLOs, SLIs, and error budgets.
Mentoring junior engineers and sharing your expertise in observability.
Participating in code reviews and technical discussions.
Researching and evaluating new observability technologies.
Contributing to the development of Crusoe’s observability strategy.

Why San Francisco, California?

San Francisco is a global hub for technology and innovation, offering a vibrant and dynamic environment for software engineers. The city is home to a large number of tech companies, startups, and research institutions, providing ample opportunities for professional growth and networking. Additionally, San Francisco boasts a rich cultural scene, world-class restaurants, and stunning natural beauty, making it an attractive place to live and work.

Career Path

The career path for a Senior Software Engineer at Crusoe can lead to various opportunities, including:

Principal Engineer: Lead technical initiatives and provide guidance to other engineers.
Staff Engineer: Focus on solving complex technical challenges and driving innovation across the organization.
Engineering Manager: Lead and manage a team of engineers, overseeing their work and ensuring their success.
Architect: Design and implement the overall architecture of Crusoe’s cloud platform.

Salary & Benefits

Crusoe offers a competitive salary and benefits package, commensurate with experience and qualifications. The salary range for this position is estimated between $180,000 and $260,000 annually. Our benefits include:

Comprehensive health insurance (medical, dental, and vision)
Paid time off (vacation, sick leave, and holidays)
Stock options
401(k) plan with company match
Professional development opportunities
Employee assistance program

Crusoe Culture

At Crusoe, we foster a culture of innovation, collaboration, and continuous learning. We are passionate about our mission to accelerate the abundance of energy and intelligence and are committed to building a sustainable and responsible cloud infrastructure. We value teamwork, open communication, and a growth mindset. We encourage our employees to take ownership of their work and to contribute to the overall success of the company.

How to Apply

If you are interested in joining our team, please submit your resume and cover letter through our online application portal. We look forward to hearing from you!

FAQ

What is Crusoe's mission?

Crusoe's mission is to accelerate the abundance of energy and intelligence.
What technologies does Crusoe use for observability?

Crusoe uses a variety of technologies for observability, including Prometheus, Grafana, Loki, and OpenTelemetry.
What are the key responsibilities of this role?

The key responsibilities of this role include designing and operating scalable observability systems, architecting end-to-end telemetry pipelines, and partnering with engineering teams to embed observability into applications and services.
What qualifications are required for this role?

The qualifications for this role include 7+ years of experience in infrastructure or platform engineering, deep expertise with metrics systems, logging pipelines, and tracing platforms, and strong programming skills in Go or Python.
What is the salary range for this role?

The salary range for this role is estimated between $180,000 and $260,000 annually.
What are the benefits of working at Crusoe?

The benefits of working at Crusoe include comprehensive health insurance, paid time off, stock options, and a 401(k) plan with company match.
What is the company culture like at Crusoe?

Crusoe fosters a culture of innovation, collaboration, and continuous learning.
What are the career growth opportunities at Crusoe?

The career path for a Senior Software Engineer at Crusoe can lead to various opportunities, including Principal Engineer, Staff Engineer, Engineering Manager, and Architect.
Is this role remote eligible?

This role is based in San Francisco, CA.
Does Crusoe support open source?

Yes, contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.) are considered a bonus.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

198,000 - 286,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Observabilityintermediate
Metrics Systemsintermediate
Prometheusintermediate
Thanosintermediate
Mimirintermediate
Cortexintermediate
Logging Pipelinesintermediate
Fluent Bitintermediate
Vectorintermediate
Lokiintermediate
ELKintermediate
Opensearchintermediate
Tracing Platformsintermediate
Jaegerintermediate
Tempointermediate
OpenTelemetryintermediate
Kubernetesintermediate
Terraformintermediate
Gointermediate
Pythonintermediate
Distributed Systemsintermediate
Performance Engineeringintermediate
Debuggingintermediate
Telemetry Pipelinesintermediate
Security Best Practicesintermediate
RBACintermediate
TLSintermediate
Secret Managementintermediate
Multi-tenant Access Controlsintermediate
Service Meshesintermediate
Load Balancersintermediate
APIsintermediate

Required Qualifications

7+ years of experience in infrastructure or platform engineering (experience)
Focus on observability and monitoring systems (experience)
Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry) (experience)
Strong programming skills in Go or Python for automation, operators, and custom integrations (experience)
Experience running observability platforms on Kubernetes (experience)
Experience operating observability platforms at scale across multi-datacenter environments (experience)
Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data (experience)
Solid understanding of distributed systems (experience)
Understanding of performance engineering (experience)
Experience debugging complex workloads (experience)
Strong collaboration skills (experience)
Ability to influence engineering teams to adopt observability best practices (experience)
Experience with AI/ML or GPU-heavy environments (Bonus) (experience)
Knowledge of event-driven or streaming systems (Kafka, NATS, Pulsar) (Bonus) (experience)
Experience implementing cost optimization strategies for large-scale observability platforms (Bonus) (experience)
Contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.) (Bonus) (experience)

Responsibilities

Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
Partnering with engineering teams to embed observability into applications, services, and infrastructure
Mentoring engineers
Shaping Crusoe’s observability strategy and technical roadmap

Benefits

general: Competitive salary and benefits package
general: Opportunity to work on cutting-edge technology in the AI and cloud infrastructure space
general: Be a part of a mission-driven company accelerating the abundance of energy and intelligence
general: Make a tangible impact on the company's success
general: Collaborate with a talented and passionate team
general: Professional development and growth opportunities
general: A culture of innovation and continuous learning
general: Sustainable and responsible approach to cloud infrastructure
general: Chance to contribute to open source observability projects
general: Exposure to high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
general: Opportunity to define and drive adoption of SLOs, SLIs, and error budgets
general: Shape the future of Crusoe's observability strategy
general: Remote work options
general: Health insurance
general: Paid time off
general: Stock options

Target Your Resume for "Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Software EngineerCloudObservabilityKubernetesSan FranciscoSenior Software EngineerCloud AvailabilityPlatform EngineeringCaliforniaPrometheusGrafanaLokiOpenTelemetryMetricsLoggingTracingTelemetryCloud InfrastructureAIArtificial IntelligenceDistributed SystemsPerformance EngineeringGoPythonAlertmanagerThanosJaegerGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

full-timePosted: Oct 1, 2025

Job Description

Senior Software Engineer - Cloud Availability Platform (Observability)

Role Overview

Day in the Life

A typical day for a Senior Software Engineer on the Cloud Availability Platform Engineering team might include:

Designing and implementing scalable observability solutions using technologies like Prometheus, Grafana, Loki, and OpenTelemetry.
Collaborating with other engineering teams to integrate observability into their applications and services.
Troubleshooting and resolving issues related to the observability platform.
Automating the provisioning and scaling of observability infrastructure.
Defining and driving adoption of SLOs, SLIs, and error budgets.
Mentoring junior engineers and sharing your expertise in observability.
Participating in code reviews and technical discussions.
Researching and evaluating new observability technologies.
Contributing to the development of Crusoe’s observability strategy.

Why San Francisco, California?

Career Path

The career path for a Senior Software Engineer at Crusoe can lead to various opportunities, including:

Principal Engineer: Lead technical initiatives and provide guidance to other engineers.
Staff Engineer: Focus on solving complex technical challenges and driving innovation across the organization.
Engineering Manager: Lead and manage a team of engineers, overseeing their work and ensuring their success.
Architect: Design and implement the overall architecture of Crusoe’s cloud platform.

Salary & Benefits

Comprehensive health insurance (medical, dental, and vision)
Paid time off (vacation, sick leave, and holidays)
Stock options
401(k) plan with company match
Professional development opportunities
Employee assistance program

Crusoe Culture

How to Apply

If you are interested in joining our team, please submit your resume and cover letter through our online application portal. We look forward to hearing from you!

FAQ

What is Crusoe's mission?

Crusoe's mission is to accelerate the abundance of energy and intelligence.
What technologies does Crusoe use for observability?

Crusoe uses a variety of technologies for observability, including Prometheus, Grafana, Loki, and OpenTelemetry.
What are the key responsibilities of this role?

The key responsibilities of this role include designing and operating scalable observability systems, architecting end-to-end telemetry pipelines, and partnering with engineering teams to embed observability into applications and services.
What qualifications are required for this role?

The qualifications for this role include 7+ years of experience in infrastructure or platform engineering, deep expertise with metrics systems, logging pipelines, and tracing platforms, and strong programming skills in Go or Python.
What is the salary range for this role?

The salary range for this role is estimated between $180,000 and $260,000 annually.
What are the benefits of working at Crusoe?

The benefits of working at Crusoe include comprehensive health insurance, paid time off, stock options, and a 401(k) plan with company match.
What is the company culture like at Crusoe?

Crusoe fosters a culture of innovation, collaboration, and continuous learning.
What are the career growth opportunities at Crusoe?

The career path for a Senior Software Engineer at Crusoe can lead to various opportunities, including Principal Engineer, Staff Engineer, Engineering Manager, and Architect.
Is this role remote eligible?

This role is based in San Francisco, CA.
Does Crusoe support open source?

Yes, contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.) are considered a bonus.

Locations

San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

198,000 - 286,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Observabilityintermediate
Metrics Systemsintermediate
Prometheusintermediate
Thanosintermediate
Mimirintermediate
Cortexintermediate
Logging Pipelinesintermediate
Fluent Bitintermediate
Vectorintermediate
Lokiintermediate
ELKintermediate
Opensearchintermediate
Tracing Platformsintermediate
Jaegerintermediate
Tempointermediate
OpenTelemetryintermediate
Kubernetesintermediate
Terraformintermediate
Gointermediate
Pythonintermediate
Distributed Systemsintermediate
Performance Engineeringintermediate
Debuggingintermediate
Telemetry Pipelinesintermediate
Security Best Practicesintermediate
RBACintermediate
TLSintermediate
Secret Managementintermediate
Multi-tenant Access Controlsintermediate
Service Meshesintermediate
Load Balancersintermediate
APIsintermediate

Required Qualifications

7+ years of experience in infrastructure or platform engineering (experience)
Focus on observability and monitoring systems (experience)
Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry) (experience)
Strong programming skills in Go or Python for automation, operators, and custom integrations (experience)
Experience running observability platforms on Kubernetes (experience)
Experience operating observability platforms at scale across multi-datacenter environments (experience)
Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data (experience)
Solid understanding of distributed systems (experience)
Understanding of performance engineering (experience)
Experience debugging complex workloads (experience)
Strong collaboration skills (experience)
Ability to influence engineering teams to adopt observability best practices (experience)
Experience with AI/ML or GPU-heavy environments (Bonus) (experience)
Knowledge of event-driven or streaming systems (Kafka, NATS, Pulsar) (Bonus) (experience)
Experience implementing cost optimization strategies for large-scale observability platforms (Bonus) (experience)
Contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.) (Bonus) (experience)

Responsibilities

Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
Partnering with engineering teams to embed observability into applications, services, and infrastructure
Mentoring engineers
Shaping Crusoe’s observability strategy and technical roadmap

Benefits

general: Competitive salary and benefits package
general: Opportunity to work on cutting-edge technology in the AI and cloud infrastructure space
general: Be a part of a mission-driven company accelerating the abundance of energy and intelligence
general: Make a tangible impact on the company's success
general: Collaborate with a talented and passionate team
general: Professional development and growth opportunities
general: A culture of innovation and continuous learning
general: Sustainable and responsible approach to cloud infrastructure
general: Chance to contribute to open source observability projects
general: Exposure to high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
general: Opportunity to define and drive adoption of SLOs, SLIs, and error budgets
general: Shape the future of Crusoe's observability strategy
general: Remote work options
general: Health insurance
general: Paid time off
general: Stock options

Target Your Resume for "Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Answer 10 quick questions to check your fit for Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap