Resume and JobRESUME AND JOB
Crusoe logo

Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

full-timePosted: Oct 1, 2025

Job Description

Senior Software Engineer - Cloud Availability Platform (Observability)

Role Overview

Crusoe is seeking a highly skilled and experienced Senior Software Engineer to join our Cloud Availability Platform Engineering team. In this role, you will be responsible for designing, developing, and operating Crusoe’s next-generation observability stack. Your work will be critical in ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform, enabling engineers to understand the internal state of distributed systems through metrics, logs, and traces. As a key member of the team, you'll contribute to the AI revolution with sustainable technology, driving meaningful innovation and setting the pace for responsible, transformative cloud infrastructure.

Day in the Life

A typical day for a Senior Software Engineer on the Cloud Availability Platform Engineering team might include:

  • Designing and implementing scalable observability solutions using technologies like Prometheus, Grafana, Loki, and OpenTelemetry.
  • Collaborating with other engineering teams to integrate observability into their applications and services.
  • Troubleshooting and resolving issues related to the observability platform.
  • Automating the provisioning and scaling of observability infrastructure.
  • Defining and driving adoption of SLOs, SLIs, and error budgets.
  • Mentoring junior engineers and sharing your expertise in observability.
  • Participating in code reviews and technical discussions.
  • Researching and evaluating new observability technologies.
  • Contributing to the development of Crusoe’s observability strategy.

Why San Francisco, California?

San Francisco is a global hub for technology and innovation, offering a vibrant and dynamic environment for software engineers. The city is home to a large number of tech companies, startups, and research institutions, providing ample opportunities for professional growth and networking. Additionally, San Francisco boasts a rich cultural scene, world-class restaurants, and stunning natural beauty, making it an attractive place to live and work.

Career Path

The career path for a Senior Software Engineer at Crusoe can lead to various opportunities, including:

  • Principal Engineer: Lead technical initiatives and provide guidance to other engineers.
  • Staff Engineer: Focus on solving complex technical challenges and driving innovation across the organization.
  • Engineering Manager: Lead and manage a team of engineers, overseeing their work and ensuring their success.
  • Architect: Design and implement the overall architecture of Crusoe’s cloud platform.

Salary & Benefits

Crusoe offers a competitive salary and benefits package, commensurate with experience and qualifications. The salary range for this position is estimated between $180,000 and $260,000 annually. Our benefits include:

  • Comprehensive health insurance (medical, dental, and vision)
  • Paid time off (vacation, sick leave, and holidays)
  • Stock options
  • 401(k) plan with company match
  • Professional development opportunities
  • Employee assistance program

Crusoe Culture

At Crusoe, we foster a culture of innovation, collaboration, and continuous learning. We are passionate about our mission to accelerate the abundance of energy and intelligence and are committed to building a sustainable and responsible cloud infrastructure. We value teamwork, open communication, and a growth mindset. We encourage our employees to take ownership of their work and to contribute to the overall success of the company.

How to Apply

If you are interested in joining our team, please submit your resume and cover letter through our online application portal. We look forward to hearing from you!

FAQ

  1. What is Crusoe's mission?

    Crusoe's mission is to accelerate the abundance of energy and intelligence.

  2. What technologies does Crusoe use for observability?

    Crusoe uses a variety of technologies for observability, including Prometheus, Grafana, Loki, and OpenTelemetry.

  3. What are the key responsibilities of this role?

    The key responsibilities of this role include designing and operating scalable observability systems, architecting end-to-end telemetry pipelines, and partnering with engineering teams to embed observability into applications and services.

  4. What qualifications are required for this role?

    The qualifications for this role include 7+ years of experience in infrastructure or platform engineering, deep expertise with metrics systems, logging pipelines, and tracing platforms, and strong programming skills in Go or Python.

  5. What is the salary range for this role?

    The salary range for this role is estimated between $180,000 and $260,000 annually.

  6. What are the benefits of working at Crusoe?

    The benefits of working at Crusoe include comprehensive health insurance, paid time off, stock options, and a 401(k) plan with company match.

  7. What is the company culture like at Crusoe?

    Crusoe fosters a culture of innovation, collaboration, and continuous learning.

  8. What are the career growth opportunities at Crusoe?

    The career path for a Senior Software Engineer at Crusoe can lead to various opportunities, including Principal Engineer, Staff Engineer, Engineering Manager, and Architect.

  9. Is this role remote eligible?

    This role is based in San Francisco, CA.

  10. Does Crusoe support open source?

    Yes, contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.) are considered a bonus.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

198,000 - 286,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Observabilityintermediate
  • Metrics Systemsintermediate
  • Prometheusintermediate
  • Thanosintermediate
  • Mimirintermediate
  • Cortexintermediate
  • Logging Pipelinesintermediate
  • Fluent Bitintermediate
  • Vectorintermediate
  • Lokiintermediate
  • ELKintermediate
  • Opensearchintermediate
  • Tracing Platformsintermediate
  • Jaegerintermediate
  • Tempointermediate
  • OpenTelemetryintermediate
  • Kubernetesintermediate
  • Terraformintermediate
  • Gointermediate
  • Pythonintermediate
  • Distributed Systemsintermediate
  • Performance Engineeringintermediate
  • Debuggingintermediate
  • Telemetry Pipelinesintermediate
  • Security Best Practicesintermediate
  • RBACintermediate
  • TLSintermediate
  • Secret Managementintermediate
  • Multi-tenant Access Controlsintermediate
  • Service Meshesintermediate
  • Load Balancersintermediate
  • APIsintermediate

Required Qualifications

  • 7+ years of experience in infrastructure or platform engineering (experience)
  • Focus on observability and monitoring systems (experience)
  • Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry) (experience)
  • Strong programming skills in Go or Python for automation, operators, and custom integrations (experience)
  • Experience running observability platforms on Kubernetes (experience)
  • Experience operating observability platforms at scale across multi-datacenter environments (experience)
  • Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data (experience)
  • Solid understanding of distributed systems (experience)
  • Understanding of performance engineering (experience)
  • Experience debugging complex workloads (experience)
  • Strong collaboration skills (experience)
  • Ability to influence engineering teams to adopt observability best practices (experience)
  • Experience with AI/ML or GPU-heavy environments (Bonus) (experience)
  • Knowledge of event-driven or streaming systems (Kafka, NATS, Pulsar) (Bonus) (experience)
  • Experience implementing cost optimization strategies for large-scale observability platforms (Bonus) (experience)
  • Contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.) (Bonus) (experience)

Responsibilities

  • Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
  • Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
  • Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
  • Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
  • Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
  • Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
  • Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
  • Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
  • Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
  • Partnering with engineering teams to embed observability into applications, services, and infrastructure
  • Mentoring engineers
  • Shaping Crusoe’s observability strategy and technical roadmap

Benefits

  • general: Competitive salary and benefits package
  • general: Opportunity to work on cutting-edge technology in the AI and cloud infrastructure space
  • general: Be a part of a mission-driven company accelerating the abundance of energy and intelligence
  • general: Make a tangible impact on the company's success
  • general: Collaborate with a talented and passionate team
  • general: Professional development and growth opportunities
  • general: A culture of innovation and continuous learning
  • general: Sustainable and responsible approach to cloud infrastructure
  • general: Chance to contribute to open source observability projects
  • general: Exposure to high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
  • general: Opportunity to define and drive adoption of SLOs, SLIs, and error budgets
  • general: Shape the future of Crusoe's observability strategy
  • general: Remote work options
  • general: Health insurance
  • general: Paid time off
  • general: Stock options

Target Your Resume for "Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Software EngineerCloudObservabilityKubernetesSan FranciscoSenior Software EngineerCloud AvailabilityPlatform EngineeringCaliforniaPrometheusGrafanaLokiOpenTelemetryMetricsLoggingTracingTelemetryCloud InfrastructureAIArtificial IntelligenceDistributed SystemsPerformance EngineeringGoPythonAlertmanagerThanosJaegerGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Crusoe logo

Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

Crusoe

Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!

full-timePosted: Oct 1, 2025

Job Description

Senior Software Engineer - Cloud Availability Platform (Observability)

Role Overview

Crusoe is seeking a highly skilled and experienced Senior Software Engineer to join our Cloud Availability Platform Engineering team. In this role, you will be responsible for designing, developing, and operating Crusoe’s next-generation observability stack. Your work will be critical in ensuring the reliability, performance, and actionable insights across Crusoe’s global infrastructure and cloud platform, enabling engineers to understand the internal state of distributed systems through metrics, logs, and traces. As a key member of the team, you'll contribute to the AI revolution with sustainable technology, driving meaningful innovation and setting the pace for responsible, transformative cloud infrastructure.

Day in the Life

A typical day for a Senior Software Engineer on the Cloud Availability Platform Engineering team might include:

  • Designing and implementing scalable observability solutions using technologies like Prometheus, Grafana, Loki, and OpenTelemetry.
  • Collaborating with other engineering teams to integrate observability into their applications and services.
  • Troubleshooting and resolving issues related to the observability platform.
  • Automating the provisioning and scaling of observability infrastructure.
  • Defining and driving adoption of SLOs, SLIs, and error budgets.
  • Mentoring junior engineers and sharing your expertise in observability.
  • Participating in code reviews and technical discussions.
  • Researching and evaluating new observability technologies.
  • Contributing to the development of Crusoe’s observability strategy.

Why San Francisco, California?

San Francisco is a global hub for technology and innovation, offering a vibrant and dynamic environment for software engineers. The city is home to a large number of tech companies, startups, and research institutions, providing ample opportunities for professional growth and networking. Additionally, San Francisco boasts a rich cultural scene, world-class restaurants, and stunning natural beauty, making it an attractive place to live and work.

Career Path

The career path for a Senior Software Engineer at Crusoe can lead to various opportunities, including:

  • Principal Engineer: Lead technical initiatives and provide guidance to other engineers.
  • Staff Engineer: Focus on solving complex technical challenges and driving innovation across the organization.
  • Engineering Manager: Lead and manage a team of engineers, overseeing their work and ensuring their success.
  • Architect: Design and implement the overall architecture of Crusoe’s cloud platform.

Salary & Benefits

Crusoe offers a competitive salary and benefits package, commensurate with experience and qualifications. The salary range for this position is estimated between $180,000 and $260,000 annually. Our benefits include:

  • Comprehensive health insurance (medical, dental, and vision)
  • Paid time off (vacation, sick leave, and holidays)
  • Stock options
  • 401(k) plan with company match
  • Professional development opportunities
  • Employee assistance program

Crusoe Culture

At Crusoe, we foster a culture of innovation, collaboration, and continuous learning. We are passionate about our mission to accelerate the abundance of energy and intelligence and are committed to building a sustainable and responsible cloud infrastructure. We value teamwork, open communication, and a growth mindset. We encourage our employees to take ownership of their work and to contribute to the overall success of the company.

How to Apply

If you are interested in joining our team, please submit your resume and cover letter through our online application portal. We look forward to hearing from you!

FAQ

  1. What is Crusoe's mission?

    Crusoe's mission is to accelerate the abundance of energy and intelligence.

  2. What technologies does Crusoe use for observability?

    Crusoe uses a variety of technologies for observability, including Prometheus, Grafana, Loki, and OpenTelemetry.

  3. What are the key responsibilities of this role?

    The key responsibilities of this role include designing and operating scalable observability systems, architecting end-to-end telemetry pipelines, and partnering with engineering teams to embed observability into applications and services.

  4. What qualifications are required for this role?

    The qualifications for this role include 7+ years of experience in infrastructure or platform engineering, deep expertise with metrics systems, logging pipelines, and tracing platforms, and strong programming skills in Go or Python.

  5. What is the salary range for this role?

    The salary range for this role is estimated between $180,000 and $260,000 annually.

  6. What are the benefits of working at Crusoe?

    The benefits of working at Crusoe include comprehensive health insurance, paid time off, stock options, and a 401(k) plan with company match.

  7. What is the company culture like at Crusoe?

    Crusoe fosters a culture of innovation, collaboration, and continuous learning.

  8. What are the career growth opportunities at Crusoe?

    The career path for a Senior Software Engineer at Crusoe can lead to various opportunities, including Principal Engineer, Staff Engineer, Engineering Manager, and Architect.

  9. Is this role remote eligible?

    This role is based in San Francisco, CA.

  10. Does Crusoe support open source?

    Yes, contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.) are considered a bonus.

Locations

  • San Francisco, California, United States

Salary

Estimated Salary Rangemedium confidence

198,000 - 286,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Observabilityintermediate
  • Metrics Systemsintermediate
  • Prometheusintermediate
  • Thanosintermediate
  • Mimirintermediate
  • Cortexintermediate
  • Logging Pipelinesintermediate
  • Fluent Bitintermediate
  • Vectorintermediate
  • Lokiintermediate
  • ELKintermediate
  • Opensearchintermediate
  • Tracing Platformsintermediate
  • Jaegerintermediate
  • Tempointermediate
  • OpenTelemetryintermediate
  • Kubernetesintermediate
  • Terraformintermediate
  • Gointermediate
  • Pythonintermediate
  • Distributed Systemsintermediate
  • Performance Engineeringintermediate
  • Debuggingintermediate
  • Telemetry Pipelinesintermediate
  • Security Best Practicesintermediate
  • RBACintermediate
  • TLSintermediate
  • Secret Managementintermediate
  • Multi-tenant Access Controlsintermediate
  • Service Meshesintermediate
  • Load Balancersintermediate
  • APIsintermediate

Required Qualifications

  • 7+ years of experience in infrastructure or platform engineering (experience)
  • Focus on observability and monitoring systems (experience)
  • Deep expertise with metrics systems (Prometheus, Thanos, Mimir, Cortex), logging pipelines (Fluent Bit, Vector, Loki, ELK/Opensearch), and tracing platforms (Jaeger, Tempo, OpenTelemetry) (experience)
  • Strong programming skills in Go or Python for automation, operators, and custom integrations (experience)
  • Experience running observability platforms on Kubernetes (experience)
  • Experience operating observability platforms at scale across multi-datacenter environments (experience)
  • Proven ability to design, optimize, and scale telemetry pipelines handling high cardinality and high throughput data (experience)
  • Solid understanding of distributed systems (experience)
  • Understanding of performance engineering (experience)
  • Experience debugging complex workloads (experience)
  • Strong collaboration skills (experience)
  • Ability to influence engineering teams to adopt observability best practices (experience)
  • Experience with AI/ML or GPU-heavy environments (Bonus) (experience)
  • Knowledge of event-driven or streaming systems (Kafka, NATS, Pulsar) (Bonus) (experience)
  • Experience implementing cost optimization strategies for large-scale observability platforms (Bonus) (experience)
  • Contributions to open source observability projects (Prometheus, OpenTelemetry, Grafana, Loki, etc.) (Bonus) (experience)

Responsibilities

  • Designing and operating scalable observability systems (metrics, logging, tracing) across multi-datacenter Kubernetes environments
  • Architecting end-to-end telemetry pipelines, including ingestion, storage, querying, and visualization
  • Extending monitoring and alerting with Prometheus, Alertmanager, Thanos/Cortex, Grafana, and OpenTelemetry
  • Building scalable log collection and processing pipelines with Fluent Bit, Vector, Loki, or ELK/Opensearch stacks
  • Implementing distributed tracing platforms (Tempo, Jaeger, OpenTelemetry) and integrating with service meshes, load balancers, and APIs
  • Defining and driving adoption of SLOs, SLIs, and error budgets across services and teams
  • Automating provisioning and scaling of observability infrastructure with Kubernetes, Terraform, and custom tooling (Go, Python)
  • Ensuring reliability and cost efficiency of telemetry pipelines while supporting high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
  • Embedding security best practices into observability platforms, including RBAC, TLS, secret management, and multi-tenant access controls
  • Partnering with engineering teams to embed observability into applications, services, and infrastructure
  • Mentoring engineers
  • Shaping Crusoe’s observability strategy and technical roadmap

Benefits

  • general: Competitive salary and benefits package
  • general: Opportunity to work on cutting-edge technology in the AI and cloud infrastructure space
  • general: Be a part of a mission-driven company accelerating the abundance of energy and intelligence
  • general: Make a tangible impact on the company's success
  • general: Collaborate with a talented and passionate team
  • general: Professional development and growth opportunities
  • general: A culture of innovation and continuous learning
  • general: Sustainable and responsible approach to cloud infrastructure
  • general: Chance to contribute to open source observability projects
  • general: Exposure to high-volume workloads (AI/ML, HPC clusters, GPU infrastructure)
  • general: Opportunity to define and drive adoption of SLOs, SLIs, and error budgets
  • general: Shape the future of Crusoe's observability strategy
  • general: Remote work options
  • general: Health insurance
  • general: Paid time off
  • general: Stock options

Target Your Resume for "Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Get personalized recommendations to optimize your resume specifically for Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now!" , Crusoe

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Software EngineerCloudObservabilityKubernetesSan FranciscoSenior Software EngineerCloud AvailabilityPlatform EngineeringCaliforniaPrometheusGrafanaLokiOpenTelemetryMetricsLoggingTracingTelemetryCloud InfrastructureAIArtificial IntelligenceDistributed SystemsPerformance EngineeringGoPythonAlertmanagerThanosJaegerGreen TechAI InfrastructureCloudEngineering

Answer 10 quick questions to check your fit for Senior Software Engineer - Cloud Availability Platform (Observability) Careers at Crusoe - San Francisco, California | Apply Now! @ Crusoe.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.