RESUME AND JOB

Engineering Manager, Site Reliability (SRE)

Sentinel Labs

Engineering Manager, Site Reliability (SRE)

Sentinel Labs

full-timePosted: Nov 24, 2025

Job Description

Job ID: 7532873003

About Us

At SentinelOne, we’re redefining cybersecurity by pushing the limits of what’s possible—leveraging AI-powered, data-driven innovation to stay ahead of tomorrow’s threats.

From building industry-leading products to cultivating an exceptional company culture, our core values guide everything we do. We’re looking for passionate individuals who thrive in collaborative environments and are eager to drive impact. If you’re excited about solving complex challenges in bold, innovative ways, we’d love to connect with you.

What are we looking for?

Please note that under Federal & FedRAMP regulations, hiring for this role is limited to US citizens only.

FedRAMP Staff may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne.

We are seeking an experienced engineering and operational Manager to lead a Site Reliability Engineering (SRE) team at SentinelOne. As the Manager of SRE, you will manage a team of SRE professionals responsible for ensuring the reliability and scalability of our products and production services, focusing on the experience our customers have in production every day. You will work closely with other engineering teams to identify and address availability, performance, and capacity issues, and you’ll be a key partner for our externally facing teams including Support, Customer Success, and Sales Engineering. This is a highly visible role within S1 with frequent executive communication opportunities, and is a great opportunity to do good work with good people all around the world.

As a team we value:

Thinking from first principles, understanding second order impacts
Curiosity to understand new systems, their operating principles and limitations
Strong operational ownership and a desire to reduce toil via automation
A drive to learn, especially from prior failures
Courage to take risks and make things happen
Empathy and humility to collaborate effectively with peers and across teams

What will you do?

Grow and lead a team of SRE professionals, including setting performance goals and measuring deliverables against key metrics, while evolving those metrics as S1 grows and needs develop
Invest in data-driven deep triage on recurring issues, collaborating with other engineering teams to identify and address issues related to reliability, performance, and capacity
Develop, improve, and implement processes for the full incident lifecycle, including incident management, post-incident analysis, and learning from incidents. Lead incident response efforts, including coordinating with other teams to investigate and resolve customer-impacting incidents
Design support model for SRE regarding service maturity and service ownership, including monitoring and alerting improvements, and SLI / SLO design and implementation
Analyze production metrics and signals to identify areas for improvement and take proactive steps to mitigate issues
Develop and implement best practices and standards for Site Reliability Engineering, from day-to-day operations to hiring and planning
Communicate effectively with cross-functional teams to ensure alignment on objectives and priorities. Deliver outcomes, not just stories and tasks.

What skills and knowledge should you bring?

8+ years of related engineering experience, with at least 2 years in a management role
Demonstrated experience leading technical and operational teams at various stages of maturity
Excellent analytical and problem-solving skills
Familiarity with modern software development methodologies, tools, and techniques, including CI/CD
Experience working with cloud-native applications and large-scale distributed systems, including a working knowledge of technologies such as Kubernetes and Terraform/IaC, and cloud providers such as AWS or GCP
Experience with various monitoring and alerting techniques and tools, including frameworks and concepts such as SLOs, OTel and Golden Signals as well as tooling such as Prometheus and Grafana
Extensive experience with incident response and management at various layers of the stack across different business needs and applications, including both hands-on experience leading incidents/post-incident analysis and experience driving broader incident management initiatives
Ability to thrive in a fast-paced, dynamic environment

Why us?

You will be joining a cutting-edge company where you will tackle extraordinary challenges and work with the very best in the industry.

Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid Company Holidays
Paid Sick Time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events, including regular happy hours and team-building events

This U.S. role has a base pay range that will vary based on the location of the candidate. For some locations, a different pay range may apply. If so, this range will be provided to you during the recruiting process. You can also reach out to the recruiter with any questions.

Base Salary Range

$160,000—$200,000 USD

SentinelOne is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

SentinelOne participates in the E-Verify Program for all U.S. based roles.

Locations

United States, (Remote)

Salary

Salary details available upon request

Estimated Salary Rangemedium confidence

220,000 - 400,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

analytical and problem-solving skillsintermediate
modern software development methodologies, tools, and techniques, including CI/CDintermediate
cloud-native applications and large-scale distributed systemsintermediate
Kubernetesintermediate
Terraform/IaCintermediate
AWS or GCPintermediate
monitoring and alerting techniques and toolsintermediate
SLOsintermediate
OTelintermediate
Golden Signalsintermediate
Prometheusintermediate
Grafanaintermediate
incident response and managementintermediate

Required Qualifications

8+ years of related engineering experience, with at least 2 years in a management role (experience)
Demonstrated experience leading technical and operational teams at various stages of maturity (experience)
Excellent analytical and problem-solving skills (experience)
Familiarity with modern software development methodologies, tools, and techniques, including CI/CD (experience)
Experience working with cloud-native applications and large-scale distributed systems, including a working knowledge of technologies such as Kubernetes and Terraform/IaC, and cloud providers such as AWS or GCP (experience)
Experience with various monitoring and alerting techniques and tools, including frameworks and concepts such as SLOs, OTel and Golden Signals as well as tooling such as Prometheus and Grafana (experience)
Extensive experience with incident response and management at various layers of the stack across different business needs and applications, including both hands-on experience leading incidents/post-incident analysis and experience driving broader incident management initiatives (experience)
Ability to thrive in a fast-paced, dynamic environment (experience)

Responsibilities

Grow and lead a team of SRE professionals, including setting performance goals and measuring deliverables against key metrics, while evolving those metrics as S1 grows and needs develop
Invest in data-driven deep triage on recurring issues, collaborating with other engineering teams to identify and address issues related to reliability, performance, and capacity
Develop, improve, and implement processes for the full incident lifecycle, including incident management, post-incident analysis, and learning from incidents. Lead incident response efforts, including coordinating with other teams to investigate and resolve customer-impacting incidents
Design support model for SRE regarding service maturity and service ownership, including monitoring and alerting improvements, and SLI / SLO design and implementation
Analyze production metrics and signals to identify areas for improvement and take proactive steps to mitigate issues
Develop and implement best practices and standards for Site Reliability Engineering, from day-to-day operations to hiring and planning
Communicate effectively with cross-functional teams to ensure alignment on objectives and priorities. Deliver outcomes, not just stories and tasks.

Benefits

general: Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
general: Unlimited PTO
general: Industry-leading gender-neutral parental leave
general: Paid Company Holidays
general: Paid Sick Time
general: Employee stock purchase program
general: Disability and life insurance
general: Employee assistance program
general: Gym membership reimbursement
general: Cell phone reimbursement
general: Numerous company-sponsored events, including regular happy hours and team-building events

Target Your Resume for "Engineering Manager, Site Reliability (SRE)" , Sentinel Labs

Get personalized recommendations to optimize your resume specifically for Engineering Manager, Site Reliability (SRE). Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Engineering Manager, Site Reliability (SRE)" , Sentinel Labs

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

71000 SRE - Customer Ops & Support71000 SRE - Customer Ops & Support

Answer 10 quick questions to check your fit for Engineering Manager, Site Reliability (SRE) @ Sentinel Labs.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Engineering Manager, Site Reliability (SRE)

Sentinel Labs

Engineering Manager, Site Reliability (SRE)

Sentinel Labs

full-timePosted: Nov 24, 2025

Job Description

Job ID: 7532873003

About Us

At SentinelOne, we’re redefining cybersecurity by pushing the limits of what’s possible—leveraging AI-powered, data-driven innovation to stay ahead of tomorrow’s threats.

What are we looking for?

Please note that under Federal & FedRAMP regulations, hiring for this role is limited to US citizens only.

FedRAMP Staff may be subject to customer or third-party background checks up to and including secret clearance if required by their role at SentinelOne.

As a team we value:

Thinking from first principles, understanding second order impacts
Curiosity to understand new systems, their operating principles and limitations
Strong operational ownership and a desire to reduce toil via automation
A drive to learn, especially from prior failures
Courage to take risks and make things happen
Empathy and humility to collaborate effectively with peers and across teams

What will you do?

Grow and lead a team of SRE professionals, including setting performance goals and measuring deliverables against key metrics, while evolving those metrics as S1 grows and needs develop
Invest in data-driven deep triage on recurring issues, collaborating with other engineering teams to identify and address issues related to reliability, performance, and capacity
Develop, improve, and implement processes for the full incident lifecycle, including incident management, post-incident analysis, and learning from incidents. Lead incident response efforts, including coordinating with other teams to investigate and resolve customer-impacting incidents
Design support model for SRE regarding service maturity and service ownership, including monitoring and alerting improvements, and SLI / SLO design and implementation
Analyze production metrics and signals to identify areas for improvement and take proactive steps to mitigate issues
Develop and implement best practices and standards for Site Reliability Engineering, from day-to-day operations to hiring and planning
Communicate effectively with cross-functional teams to ensure alignment on objectives and priorities. Deliver outcomes, not just stories and tasks.

What skills and knowledge should you bring?

8+ years of related engineering experience, with at least 2 years in a management role
Demonstrated experience leading technical and operational teams at various stages of maturity
Excellent analytical and problem-solving skills
Familiarity with modern software development methodologies, tools, and techniques, including CI/CD
Experience working with cloud-native applications and large-scale distributed systems, including a working knowledge of technologies such as Kubernetes and Terraform/IaC, and cloud providers such as AWS or GCP
Experience with various monitoring and alerting techniques and tools, including frameworks and concepts such as SLOs, OTel and Golden Signals as well as tooling such as Prometheus and Grafana
Extensive experience with incident response and management at various layers of the stack across different business needs and applications, including both hands-on experience leading incidents/post-incident analysis and experience driving broader incident management initiatives
Ability to thrive in a fast-paced, dynamic environment

Why us?

You will be joining a cutting-edge company where you will tackle extraordinary challenges and work with the very best in the industry.

Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid Company Holidays
Paid Sick Time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events, including regular happy hours and team-building events

Base Salary Range

$160,000—$200,000 USD

SentinelOne participates in the E-Verify Program for all U.S. based roles.

Locations

United States, (Remote)

Salary

Salary details available upon request

Estimated Salary Rangemedium confidence

220,000 - 400,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

analytical and problem-solving skillsintermediate
modern software development methodologies, tools, and techniques, including CI/CDintermediate
cloud-native applications and large-scale distributed systemsintermediate
Kubernetesintermediate
Terraform/IaCintermediate
AWS or GCPintermediate
monitoring and alerting techniques and toolsintermediate
SLOsintermediate
OTelintermediate
Golden Signalsintermediate
Prometheusintermediate
Grafanaintermediate
incident response and managementintermediate

Required Qualifications

8+ years of related engineering experience, with at least 2 years in a management role (experience)
Demonstrated experience leading technical and operational teams at various stages of maturity (experience)
Excellent analytical and problem-solving skills (experience)
Familiarity with modern software development methodologies, tools, and techniques, including CI/CD (experience)
Experience working with cloud-native applications and large-scale distributed systems, including a working knowledge of technologies such as Kubernetes and Terraform/IaC, and cloud providers such as AWS or GCP (experience)
Experience with various monitoring and alerting techniques and tools, including frameworks and concepts such as SLOs, OTel and Golden Signals as well as tooling such as Prometheus and Grafana (experience)
Extensive experience with incident response and management at various layers of the stack across different business needs and applications, including both hands-on experience leading incidents/post-incident analysis and experience driving broader incident management initiatives (experience)
Ability to thrive in a fast-paced, dynamic environment (experience)

Responsibilities

Grow and lead a team of SRE professionals, including setting performance goals and measuring deliverables against key metrics, while evolving those metrics as S1 grows and needs develop
Invest in data-driven deep triage on recurring issues, collaborating with other engineering teams to identify and address issues related to reliability, performance, and capacity
Develop, improve, and implement processes for the full incident lifecycle, including incident management, post-incident analysis, and learning from incidents. Lead incident response efforts, including coordinating with other teams to investigate and resolve customer-impacting incidents
Design support model for SRE regarding service maturity and service ownership, including monitoring and alerting improvements, and SLI / SLO design and implementation
Analyze production metrics and signals to identify areas for improvement and take proactive steps to mitigate issues
Develop and implement best practices and standards for Site Reliability Engineering, from day-to-day operations to hiring and planning
Communicate effectively with cross-functional teams to ensure alignment on objectives and priorities. Deliver outcomes, not just stories and tasks.

Benefits

general: Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
general: Unlimited PTO
general: Industry-leading gender-neutral parental leave
general: Paid Company Holidays
general: Paid Sick Time
general: Employee stock purchase program
general: Disability and life insurance
general: Employee assistance program
general: Gym membership reimbursement
general: Cell phone reimbursement
general: Numerous company-sponsored events, including regular happy hours and team-building events

Target Your Resume for "Engineering Manager, Site Reliability (SRE)" , Sentinel Labs

Get personalized recommendations to optimize your resume specifically for Engineering Manager, Site Reliability (SRE). Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Engineering Manager, Site Reliability (SRE)" , Sentinel Labs

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

71000 SRE - Customer Ops & Support71000 SRE - Customer Ops & Support

Answer 10 quick questions to check your fit for Engineering Manager, Site Reliability (SRE) @ Sentinel Labs.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap