RESUME AND JOB

Site Reliability Engineer (SRE)

Cognizant

Site Reliability Engineer (SRE)

Cognizant

full-timePosted: Dec 7, 2025

Job Description

About the role

As a Site Reliability Engineer (SRE), you will make an impact by designing and implementing advanced observability solutions for edge computing environments. You will be a valued member of our Infrastructure & Operations team, collaborating with engineering and platform teams to ensure high availability, reliability, and performance across distributed systems.

In this role, you will:

Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection.
Define and maintain SLIs, SLOs, and business KPIs to improve system reliability across edge and centralized infrastructure.
Build and optimize dashboards, visualizations, and alerting systems for real-time insights and rapid incident response.
Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing.
Collaborate with engineering teams to embed observability best practices into applications and infrastructure.
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems.
Lead incident postmortems and implement observability-driven improvements to prevent recurrence.
Develop automation scripts and tools to enhance observability pipelines, addressing edge-specific challenges like bandwidth and connectivity.

What you need to have to be considered

3–5 years of experience in service reliability/operations for large-scale, high-performance applications in hybrid environments (on-prem and cloud).
Strong scripting and automation skills for building dashboards and managing application performance.
Proficiency in programming languages such as Go, Python, Java, or Rust.
Hands-on experience with databases (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series DBs).
2+ years of experience transitioning platforms to cloud and containerization (GCP, AWS, Rancher, or similar).
Experience maintaining containerized applications in GKE/RKE/AKE environments.
Expertise in implementing cloud observability using OpenTelemetry (OTEL) for monitoring and distributed tracing.
Knowledge of networking protocols (TCP/IP, HTTP, DNS) and troubleshooting in high-pressure scenarios.

These will help you stand out

Experience managing application availability for 24x7 high-availability platforms.
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace.
Hands-on experience with CI/CD tools and Rally, Confluence.
Knowledge of in-memory caching solutions (Redis preferred).
Strong debugging skills across integrated technical platforms and API gateways.
Exposure to GCS, Cloud SQL, Spanner, Firestore, and enterprise-level infrastructure operations.
Experience with HashiCorp Vault, Vertex AI, Gen AI, and BigQuery.

Work model: On-site

This is an onsite position requiring presence at a Cognizant or client location in Arizona City, AZ and/or Scottsdale, AZ. We strive to provide flexibility wherever possible and support a healthy work-life balance through our wellbeing programs.

The working arrangements for this role are accurate as of the date of posting. This may change based on the project you’re engaged in, as well as business and client requirements. Rest assured; we will always be clear about role expectations.

Applicants may be required to attend interviews in person or by video conference. In addition, candidates may be required to present their current state or government issued ID during each interview.

Salary and Other Compensation:

The annual salary for this position is between $60,000 – $93,500 depending on experience and other qualifications of the successful candidate.

This position is also eligible for Cognizant’s discretionary annual incentive program, based on performance and subject to the terms of Cognizant’s applicable plans.

Benefits: Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:

• Medical/Dental/Vision/Life Insurance

• Paid holidays plus Paid Time Off

• 401(k) plan and contributions

• Long-term/Short-term Disability

• Paid Parental Leave

• Employee Stock Purchase Plan

Disclaimer:

The salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.

Applicants may be required to attend interviews in person or by video conference. In addition, candidates may be required to present their current state or government issued ID during each interview.

Work Authorization: **Candidate must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future**

The Cognizant community:
We are a high caliber team who appreciate and support one another. Our people uphold an energetic, collaborative and inclusive workplace where everyone can thrive.

Cognizant is a global community with more than 300,000 associates around the world.
We don’t just dream of a better way – we make it happen.
We take care of our people, clients, company, communities and climate by doing what’s right.
We foster an innovative environment where you can build the career path that’s right for you.

About us:
Cognizant is one of the world's leading professional services companies, transforming clients' business, operating, and technology models for the digital era. Our unique industry-based, consultative approach helps clients envision, build, and run more innovative and efficient businesses. Headquartered in the U.S., Cognizant (a member of the NASDAQ-100 and one of Forbes World’s Best Employers 2025) is consistently listed among the most admired companies in the world. Learn how Cognizant helps clients lead with digital at www.cognizant.com

Cognizant is an equal opportunity employer. Your application and candidacy will not be considered based on race, color, sex, religion, creed, sexual orientation, gender identity, national origin, disability, genetic information, pregnancy, veteran status or any other characteristic protected by federal, state or local laws.

If you have a disability that requires reasonable accommodation to search for a job opening or submit an application, please email CareersNA2@cognizant.com with your request and contact information.

Disclaimer:
Compensation information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.

Applicants may be required to attend interviews in person or by video conference. In addition, candidates may be required to present their current state or government issued ID during each interview.

About the Role/Company

Cognizant is a global community with more than 300,000 associates around the world
We don’t just dream of a better way – we make it happen
We take care of our people, clients, company, communities and climate by doing what’s right
We foster an innovative environment where you can build the career path that’s right for you
Cognizant is one of the world's leading professional services companies, transforming clients' business, operating, and technology models for the digital era
Headquartered in the U.S., Cognizant is a member of the NASDAQ-100 and one of Forbes World’s Best Employers 2025
Cognizant is consistently listed among the most admired companies in the world
Cognizant is an equal opportunity employer

Key Responsibilities

Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection
Define and maintain SLIs, SLOs, and business KPIs to improve system reliability across edge and centralized infrastructure
Build and optimize dashboards, visualizations, and alerting systems for real-time insights and rapid incident response
Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing
Collaborate with engineering teams to embed observability best practices into applications and infrastructure
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems
Lead incident postmortems and implement observability-driven improvements to prevent recurrence
Develop automation scripts and tools to enhance observability pipelines, addressing edge-specific challenges like bandwidth and connectivity

Required Qualifications

–5 years of experience in service reliability/operations for large-scale, high-performance applications in hybrid environments (on-prem and cloud)
Strong scripting and automation skills for building dashboards and managing application performance
Proficiency in programming languages such as Go, Python, Java, or Rust
Hands-on experience with databases (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series DBs)
+ years of experience transitioning platforms to cloud and containerization (GCP, AWS, Rancher, or similar)
Experience maintaining containerized applications in GKE/RKE/AKE environments
Expertise in implementing cloud observability using OpenTelemetry (OTEL) for monitoring and distributed tracing
Knowledge of networking protocols (TCP/IP, HTTP, DNS) and troubleshooting in high-pressure scenarios

Preferred Qualifications

Experience managing application availability for 24x7 high-availability platforms
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace
Hands-on experience with CI/CD tools and Rally, Confluence
Knowledge of in-memory caching solutions (Redis preferred)
Strong debugging skills across integrated technical platforms and API gateways
Exposure to GCS, Cloud SQL, Spanner, Firestore, and enterprise-level infrastructure operations
Experience with HashiCorp Vault, Vertex AI, Gen AI, and BigQuery

Skills Required

Scripting and automation
Programming languages (Go, Python, Java, Rust)
Database management (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, time-series DBs)
Cloud and containerization (GCP, AWS, Rancher)
Networking protocols (TCP/IP, HTTP, DNS)
Observability implementation (OpenTelemetry)

Benefits & Perks

Medical/Dental/Vision/Life Insurance
Paid holidays plus Paid Time Off
01(k) plan and contributions
Long-term/Short-term Disability
Paid Parental Leave
Employee Stock Purchase Plan

Additional Requirements

On-site work model requiring presence at a Cognizant or client location in Arizona City, AZ and/or Scottsdale, AZ
Applicants may be required to attend interviews in person or by video conference
Candidates may be required to present their current state or government issued ID during each interview
Candidate must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future

Locations

India

Salary

60,000 - 93,500 USD / yearly

Skills Required

Scripting and automationintermediate
Programming languages (Go, Python, Java, Rust)intermediate
Database management (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, time-series DBs)intermediate
Cloud and containerization (GCP, AWS, Rancher)intermediate
Networking protocols (TCP/IP, HTTP, DNS)intermediate
Observability implementation (OpenTelemetry)intermediate

Required Qualifications

–5 years of experience in service reliability/operations for large-scale, high-performance applications in hybrid environments (on-prem and cloud) (experience)
Strong scripting and automation skills for building dashboards and managing application performance (experience)
Proficiency in programming languages such as Go, Python, Java, or Rust (experience)
Hands-on experience with databases (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series DBs) (experience)
+ years of experience transitioning platforms to cloud and containerization (GCP, AWS, Rancher, or similar) (experience)
Experience maintaining containerized applications in GKE/RKE/AKE environments (experience)
Expertise in implementing cloud observability using OpenTelemetry (OTEL) for monitoring and distributed tracing (experience)
Knowledge of networking protocols (TCP/IP, HTTP, DNS) and troubleshooting in high-pressure scenarios (experience)

Preferred Qualifications

Experience managing application availability for 24x7 high-availability platforms (experience)
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace (experience)
Hands-on experience with CI/CD tools and Rally, Confluence (experience)
Knowledge of in-memory caching solutions (Redis preferred) (experience)
Strong debugging skills across integrated technical platforms and API gateways (experience)
Exposure to GCS, Cloud SQL, Spanner, Firestore, and enterprise-level infrastructure operations (experience)
Experience with HashiCorp Vault, Vertex AI, Gen AI, and BigQuery (experience)

Responsibilities

Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection
Define and maintain SLIs, SLOs, and business KPIs to improve system reliability across edge and centralized infrastructure
Build and optimize dashboards, visualizations, and alerting systems for real-time insights and rapid incident response
Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing
Collaborate with engineering teams to embed observability best practices into applications and infrastructure
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems
Lead incident postmortems and implement observability-driven improvements to prevent recurrence
Develop automation scripts and tools to enhance observability pipelines, addressing edge-specific challenges like bandwidth and connectivity

Benefits

general: Medical/Dental/Vision/Life Insurance
general: Paid holidays plus Paid Time Off
general: 01(k) plan and contributions
general: Long-term/Short-term Disability
general: Paid Parental Leave
general: Employee Stock Purchase Plan

Target Your Resume for "Site Reliability Engineer (SRE)" , Cognizant

Get personalized recommendations to optimize your resume specifically for Site Reliability Engineer (SRE). Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Site Reliability Engineer (SRE)" , Cognizant

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

TechnologyIT ServicesTechnologyConsulting

Answer 10 quick questions to check your fit for Site Reliability Engineer (SRE) @ Cognizant.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Site Reliability Engineer (SRE)

Cognizant

Site Reliability Engineer (SRE)

Cognizant

full-timePosted: Dec 7, 2025

Job Description

About the role

In this role, you will:

Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection.
Define and maintain SLIs, SLOs, and business KPIs to improve system reliability across edge and centralized infrastructure.
Build and optimize dashboards, visualizations, and alerting systems for real-time insights and rapid incident response.
Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing.
Collaborate with engineering teams to embed observability best practices into applications and infrastructure.
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems.
Lead incident postmortems and implement observability-driven improvements to prevent recurrence.
Develop automation scripts and tools to enhance observability pipelines, addressing edge-specific challenges like bandwidth and connectivity.

What you need to have to be considered

3–5 years of experience in service reliability/operations for large-scale, high-performance applications in hybrid environments (on-prem and cloud).
Strong scripting and automation skills for building dashboards and managing application performance.
Proficiency in programming languages such as Go, Python, Java, or Rust.
Hands-on experience with databases (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series DBs).
2+ years of experience transitioning platforms to cloud and containerization (GCP, AWS, Rancher, or similar).
Experience maintaining containerized applications in GKE/RKE/AKE environments.
Expertise in implementing cloud observability using OpenTelemetry (OTEL) for monitoring and distributed tracing.
Knowledge of networking protocols (TCP/IP, HTTP, DNS) and troubleshooting in high-pressure scenarios.

These will help you stand out

Experience managing application availability for 24x7 high-availability platforms.
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace.
Hands-on experience with CI/CD tools and Rally, Confluence.
Knowledge of in-memory caching solutions (Redis preferred).
Strong debugging skills across integrated technical platforms and API gateways.
Exposure to GCS, Cloud SQL, Spanner, Firestore, and enterprise-level infrastructure operations.
Experience with HashiCorp Vault, Vertex AI, Gen AI, and BigQuery.

Work model: On-site

Applicants may be required to attend interviews in person or by video conference. In addition, candidates may be required to present their current state or government issued ID during each interview.

Salary and Other Compensation:

The annual salary for this position is between $60,000 – $93,500 depending on experience and other qualifications of the successful candidate.

This position is also eligible for Cognizant’s discretionary annual incentive program, based on performance and subject to the terms of Cognizant’s applicable plans.

Benefits: Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:

• Medical/Dental/Vision/Life Insurance

• Paid holidays plus Paid Time Off

• 401(k) plan and contributions

• Long-term/Short-term Disability

• Paid Parental Leave

• Employee Stock Purchase Plan

Disclaimer:

The salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.

Applicants may be required to attend interviews in person or by video conference. In addition, candidates may be required to present their current state or government issued ID during each interview.

Work Authorization: **Candidate must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future**

The Cognizant community:
We are a high caliber team who appreciate and support one another. Our people uphold an energetic, collaborative and inclusive workplace where everyone can thrive.

Cognizant is a global community with more than 300,000 associates around the world.
We don’t just dream of a better way – we make it happen.
We take care of our people, clients, company, communities and climate by doing what’s right.
We foster an innovative environment where you can build the career path that’s right for you.

If you have a disability that requires reasonable accommodation to search for a job opening or submit an application, please email CareersNA2@cognizant.com with your request and contact information.

Disclaimer:
Compensation information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.

About the Role/Company

Cognizant is a global community with more than 300,000 associates around the world
We don’t just dream of a better way – we make it happen
We take care of our people, clients, company, communities and climate by doing what’s right
We foster an innovative environment where you can build the career path that’s right for you
Cognizant is one of the world's leading professional services companies, transforming clients' business, operating, and technology models for the digital era
Headquartered in the U.S., Cognizant is a member of the NASDAQ-100 and one of Forbes World’s Best Employers 2025
Cognizant is consistently listed among the most admired companies in the world
Cognizant is an equal opportunity employer

Key Responsibilities

Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection
Define and maintain SLIs, SLOs, and business KPIs to improve system reliability across edge and centralized infrastructure
Build and optimize dashboards, visualizations, and alerting systems for real-time insights and rapid incident response
Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing
Collaborate with engineering teams to embed observability best practices into applications and infrastructure
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems
Lead incident postmortems and implement observability-driven improvements to prevent recurrence
Develop automation scripts and tools to enhance observability pipelines, addressing edge-specific challenges like bandwidth and connectivity

Required Qualifications

–5 years of experience in service reliability/operations for large-scale, high-performance applications in hybrid environments (on-prem and cloud)
Strong scripting and automation skills for building dashboards and managing application performance
Proficiency in programming languages such as Go, Python, Java, or Rust
Hands-on experience with databases (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series DBs)
+ years of experience transitioning platforms to cloud and containerization (GCP, AWS, Rancher, or similar)
Experience maintaining containerized applications in GKE/RKE/AKE environments
Expertise in implementing cloud observability using OpenTelemetry (OTEL) for monitoring and distributed tracing
Knowledge of networking protocols (TCP/IP, HTTP, DNS) and troubleshooting in high-pressure scenarios

Preferred Qualifications

Experience managing application availability for 24x7 high-availability platforms
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace
Hands-on experience with CI/CD tools and Rally, Confluence
Knowledge of in-memory caching solutions (Redis preferred)
Strong debugging skills across integrated technical platforms and API gateways
Exposure to GCS, Cloud SQL, Spanner, Firestore, and enterprise-level infrastructure operations
Experience with HashiCorp Vault, Vertex AI, Gen AI, and BigQuery

Skills Required

Scripting and automation
Programming languages (Go, Python, Java, Rust)
Database management (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, time-series DBs)
Cloud and containerization (GCP, AWS, Rancher)
Networking protocols (TCP/IP, HTTP, DNS)
Observability implementation (OpenTelemetry)

Benefits & Perks

Medical/Dental/Vision/Life Insurance
Paid holidays plus Paid Time Off
01(k) plan and contributions
Long-term/Short-term Disability
Paid Parental Leave
Employee Stock Purchase Plan

Additional Requirements

On-site work model requiring presence at a Cognizant or client location in Arizona City, AZ and/or Scottsdale, AZ
Applicants may be required to attend interviews in person or by video conference
Candidates may be required to present their current state or government issued ID during each interview
Candidate must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future

Locations

India

Salary

60,000 - 93,500 USD / yearly

Skills Required

Scripting and automationintermediate
Programming languages (Go, Python, Java, Rust)intermediate
Database management (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, time-series DBs)intermediate
Cloud and containerization (GCP, AWS, Rancher)intermediate
Networking protocols (TCP/IP, HTTP, DNS)intermediate
Observability implementation (OpenTelemetry)intermediate

Required Qualifications

–5 years of experience in service reliability/operations for large-scale, high-performance applications in hybrid environments (on-prem and cloud) (experience)
Strong scripting and automation skills for building dashboards and managing application performance (experience)
Proficiency in programming languages such as Go, Python, Java, or Rust (experience)
Hands-on experience with databases (Oracle, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series DBs) (experience)
+ years of experience transitioning platforms to cloud and containerization (GCP, AWS, Rancher, or similar) (experience)
Experience maintaining containerized applications in GKE/RKE/AKE environments (experience)
Expertise in implementing cloud observability using OpenTelemetry (OTEL) for monitoring and distributed tracing (experience)
Knowledge of networking protocols (TCP/IP, HTTP, DNS) and troubleshooting in high-pressure scenarios (experience)

Preferred Qualifications

Experience managing application availability for 24x7 high-availability platforms (experience)
Familiarity with monitoring tools like Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace (experience)
Hands-on experience with CI/CD tools and Rally, Confluence (experience)
Knowledge of in-memory caching solutions (Redis preferred) (experience)
Strong debugging skills across integrated technical platforms and API gateways (experience)
Exposure to GCS, Cloud SQL, Spanner, Firestore, and enterprise-level infrastructure operations (experience)
Experience with HashiCorp Vault, Vertex AI, Gen AI, and BigQuery (experience)

Responsibilities

Design and implement observability frameworks for edge environments, including monitoring, logging, tracing, and metrics collection
Define and maintain SLIs, SLOs, and business KPIs to improve system reliability across edge and centralized infrastructure
Build and optimize dashboards, visualizations, and alerting systems for real-time insights and rapid incident response
Implement distributed tracing and log aggregation systems to troubleshoot complex issues in edge computing
Collaborate with engineering teams to embed observability best practices into applications and infrastructure
Drive proactive issue detection and resolution, reducing MTTD and MTTR across distributed systems
Lead incident postmortems and implement observability-driven improvements to prevent recurrence
Develop automation scripts and tools to enhance observability pipelines, addressing edge-specific challenges like bandwidth and connectivity

Benefits

general: Medical/Dental/Vision/Life Insurance
general: Paid holidays plus Paid Time Off
general: 01(k) plan and contributions
general: Long-term/Short-term Disability
general: Paid Parental Leave
general: Employee Stock Purchase Plan

Target Your Resume for "Site Reliability Engineer (SRE)" , Cognizant

Get personalized recommendations to optimize your resume specifically for Site Reliability Engineer (SRE). Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Site Reliability Engineer (SRE)" , Cognizant

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

TechnologyIT ServicesTechnologyConsulting

Answer 10 quick questions to check your fit for Site Reliability Engineer (SRE) @ Cognizant.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap