Resume and JobRESUME AND JOB
Amgen logo

Principal Site Reliability Engineer

Amgen

Principal Site Reliability Engineer

Amgen logo

Amgen

full-time

Posted: November 12, 2025

Number of Vacancies: 1

Job Description

ABOUT AMGEN

What you will do

  • Ensure the reliability, scalability, and performance of Amgen’s infrastructure, platforms, and applications. Proactively identify and resolve performance bottlenecks and implement long-term fixes.
  • Continuously evaluate system design and usage to identify opportunities for cost optimization, ensuring infrastructure efficiency without compromising reliability.
  • Drive the adoption of automation and Infrastructure as Code (IaC) across the organization to streamline operations, minimize manual interventions, and enhance scalability.
  • Implement tools and frameworks (such as Terraform, Ansible, or Kubernetes) that increase efficiency and reduce infrastructure costs through optimized resource utilization.
  • Establish standardized operational processes, tools, and frameworks across Amgen’s technology stack to ensure consistency, maintainability, and best-in-class reliability practices.
  • Champion the use of industry standards to optimize performance and increase operational efficiency.
  • Implement and maintain comprehensive monitoring, alerting, and logging systems to detect issues early and ensure rapid incident response.
  • Lead the incident management process to minimize downtime, conduct root cause analysis, and implement preventive measures to avoid future occurrences.
  • Foster a culture of continuous improvement by leveraging data from incidents and performance monitoring.
  • Partner with software engineering, and IT teams to integrate reliability, performance optimization, and cost-saving strategies throughout the development lifecycle.
  • Act as a SME for SRE principles and advocate for best practices for assigned Projects.
  • Execute capacity planning processes to support future growth, performance, and cost management.
  • Maintain disaster recovery strategies to ensure system reliability and minimize downtime in the event of failures.

What we expect of you

  • Master’s degree and 8 to 10 years of IT infrastructure, Site Reliability Engineering or related fields experience OR
  • Bachelor’s degree and 10 to 14 years of IT infrastructure, Site Reliability Engineering or related fields experience OR
  • Diploma and 14 to 18 years of IT infrastructure, Site Reliability Engineering or related fields experience.
  • Bachelor’s degree in computer science and engineering preferred, other Engineering field is considered

Must-Have Skills

  • Extensively experienced with AWS Cloud Services
  • Proficient in CI/CD (Jenkins/Gitlab), Observability, IAC, Gitops etc
  • Experience with containerization (Docker) and orchestration tools (Kubernetes) to optimize resource usage and improve scalability.
  • Identify and specify SRE tasks
  • Strong Hands-on SRE tasks and automate using Python/ Scripting language
  • Well Versed with FinOps, Infra-Ops, & Platform Operations.
  • Ability to learn new technologies quickly. Strong problem-solving and analytical skills. Excellent communication and teamwork skills.
  • Leadership skills are mandatory to lead a team of 4 to 5 to guide on Technical blockers
  • Knowledge of cloud-native technologies and strategies for cost optimization in multi-cloud environments.
  • Familiarity with distributed systems, databases, and large-scale system architectures.
  • Ability to foster a collaborative and innovative work environment.
  • Strong problem-solving abilities and attention to detail.
  • High degree of initiative and self-motivation.

Good-to-Have Skills

  • Databricks Knowledge/Exposure is good to have (need to upskill if hired)

Locations

  • Hyderabad, India

Salary

Salary not disclosed

Estimated Salary Rangehigh confidence

80,000 - 120,000 USD / yearly

Source: xAI estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Extensively experienced with AWS Cloud Servicesintermediate
  • Proficient in CI/CD (Jenkins/Gitlab), Observability, IAC, Gitops etcintermediate
  • Experience with containerization (Docker) and orchestration tools (Kubernetes) to optimize resource usage and improve scalability.intermediate
  • Identify and specify SRE tasksintermediate
  • Strong Hands-on SRE tasks and automate using Python/ Scripting languageintermediate
  • Well Versed with FinOps, Infra-Ops, & Platform Operations.intermediate
  • Ability to learn new technologies quickly. Strong problem-solving and analytical skills. Excellent communication and teamwork skills.intermediate
  • Leadership skills are mandatory to lead a team of 4 to 5 to guide on Technical blockersintermediate
  • Knowledge of cloud-native technologies and strategies for cost optimization in multi-cloud environments.intermediate
  • Familiarity with distributed systems, databases, and large-scale system architectures.intermediate
  • Databricks Knowledge/Exposure is good to have (need to upskill if hired)intermediate
  • Ability to foster a collaborative and innovative work environment.intermediate
  • Strong problem-solving abilities and attention to detail.intermediate
  • High degree of initiative and self-motivation.intermediate

Required Qualifications

  • Master’s degree and 8 to 10 years of IT infrastructure, Site Reliability Engineering or related fields experience OR (experience)
  • Bachelor’s degree and 10 to 14 years of IT infrastructure, Site Reliability Engineering or related fields experience OR (experience)
  • Diploma and 14 to 18 years of IT infrastructure, Site Reliability Engineering or related fields experience. (experience)
  • Bachelor’s degree in computer science and engineering preferred, other Engineering field is considered (experience)

Responsibilities

  • Ensure the reliability, scalability, and performance of Amgen’s infrastructure, platforms, and applications. Proactively identify and resolve performance bottlenecks and implement long-term fixes.
  • Continuously evaluate system design and usage to identify opportunities for cost optimization, ensuring infrastructure efficiency without compromising reliability.
  • Drive the adoption of automation and Infrastructure as Code (IaC) across the organization to streamline operations, minimize manual interventions, and enhance scalability.
  • Implement tools and frameworks (such as Terraform, Ansible, or Kubernetes) that increase efficiency and reduce infrastructure costs through optimized resource utilization.
  • Establish standardized operational processes, tools, and frameworks across Amgen’s technology stack to ensure consistency, maintainability, and best-in-class reliability practices.
  • Champion the use of industry standards to optimize performance and increase operational efficiency.
  • Implement and maintain comprehensive monitoring, alerting, and logging systems to detect issues early and ensure rapid incident response.
  • Lead the incident management process to minimize downtime, conduct root cause analysis, and implement preventive measures to avoid future occurrences.
  • Foster a culture of continuous improvement by leveraging data from incidents and performance monitoring.
  • Partner with software engineering, and IT teams to integrate reliability, performance optimization, and cost-saving strategies throughout the development lifecycle.
  • Act as a SME for SRE principles and advocate for best practices for assigned Projects.
  • Execute capacity planning processes to support future growth, performance, and cost management.
  • Maintain disaster recovery strategies to ensure system reliability and minimize downtime in the event of failures.

Target Your Resume for "Principal Site Reliability Engineer" , Amgen

Get personalized recommendations to optimize your resume specifically for Principal Site Reliability Engineer. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Principal Site Reliability Engineer" , Amgen

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Software EngineeringCloudFull StackInformation SystemsTechnology

Related Jobs You May Like

No related jobs found at the moment.

Amgen logo

Principal Site Reliability Engineer

Amgen

Principal Site Reliability Engineer

Amgen logo

Amgen

full-time

Posted: November 12, 2025

Number of Vacancies: 1

Job Description

ABOUT AMGEN

What you will do

  • Ensure the reliability, scalability, and performance of Amgen’s infrastructure, platforms, and applications. Proactively identify and resolve performance bottlenecks and implement long-term fixes.
  • Continuously evaluate system design and usage to identify opportunities for cost optimization, ensuring infrastructure efficiency without compromising reliability.
  • Drive the adoption of automation and Infrastructure as Code (IaC) across the organization to streamline operations, minimize manual interventions, and enhance scalability.
  • Implement tools and frameworks (such as Terraform, Ansible, or Kubernetes) that increase efficiency and reduce infrastructure costs through optimized resource utilization.
  • Establish standardized operational processes, tools, and frameworks across Amgen’s technology stack to ensure consistency, maintainability, and best-in-class reliability practices.
  • Champion the use of industry standards to optimize performance and increase operational efficiency.
  • Implement and maintain comprehensive monitoring, alerting, and logging systems to detect issues early and ensure rapid incident response.
  • Lead the incident management process to minimize downtime, conduct root cause analysis, and implement preventive measures to avoid future occurrences.
  • Foster a culture of continuous improvement by leveraging data from incidents and performance monitoring.
  • Partner with software engineering, and IT teams to integrate reliability, performance optimization, and cost-saving strategies throughout the development lifecycle.
  • Act as a SME for SRE principles and advocate for best practices for assigned Projects.
  • Execute capacity planning processes to support future growth, performance, and cost management.
  • Maintain disaster recovery strategies to ensure system reliability and minimize downtime in the event of failures.

What we expect of you

  • Master’s degree and 8 to 10 years of IT infrastructure, Site Reliability Engineering or related fields experience OR
  • Bachelor’s degree and 10 to 14 years of IT infrastructure, Site Reliability Engineering or related fields experience OR
  • Diploma and 14 to 18 years of IT infrastructure, Site Reliability Engineering or related fields experience.
  • Bachelor’s degree in computer science and engineering preferred, other Engineering field is considered

Must-Have Skills

  • Extensively experienced with AWS Cloud Services
  • Proficient in CI/CD (Jenkins/Gitlab), Observability, IAC, Gitops etc
  • Experience with containerization (Docker) and orchestration tools (Kubernetes) to optimize resource usage and improve scalability.
  • Identify and specify SRE tasks
  • Strong Hands-on SRE tasks and automate using Python/ Scripting language
  • Well Versed with FinOps, Infra-Ops, & Platform Operations.
  • Ability to learn new technologies quickly. Strong problem-solving and analytical skills. Excellent communication and teamwork skills.
  • Leadership skills are mandatory to lead a team of 4 to 5 to guide on Technical blockers
  • Knowledge of cloud-native technologies and strategies for cost optimization in multi-cloud environments.
  • Familiarity with distributed systems, databases, and large-scale system architectures.
  • Ability to foster a collaborative and innovative work environment.
  • Strong problem-solving abilities and attention to detail.
  • High degree of initiative and self-motivation.

Good-to-Have Skills

  • Databricks Knowledge/Exposure is good to have (need to upskill if hired)

Locations

  • Hyderabad, India

Salary

Salary not disclosed

Estimated Salary Rangehigh confidence

80,000 - 120,000 USD / yearly

Source: xAI estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • Extensively experienced with AWS Cloud Servicesintermediate
  • Proficient in CI/CD (Jenkins/Gitlab), Observability, IAC, Gitops etcintermediate
  • Experience with containerization (Docker) and orchestration tools (Kubernetes) to optimize resource usage and improve scalability.intermediate
  • Identify and specify SRE tasksintermediate
  • Strong Hands-on SRE tasks and automate using Python/ Scripting languageintermediate
  • Well Versed with FinOps, Infra-Ops, & Platform Operations.intermediate
  • Ability to learn new technologies quickly. Strong problem-solving and analytical skills. Excellent communication and teamwork skills.intermediate
  • Leadership skills are mandatory to lead a team of 4 to 5 to guide on Technical blockersintermediate
  • Knowledge of cloud-native technologies and strategies for cost optimization in multi-cloud environments.intermediate
  • Familiarity with distributed systems, databases, and large-scale system architectures.intermediate
  • Databricks Knowledge/Exposure is good to have (need to upskill if hired)intermediate
  • Ability to foster a collaborative and innovative work environment.intermediate
  • Strong problem-solving abilities and attention to detail.intermediate
  • High degree of initiative and self-motivation.intermediate

Required Qualifications

  • Master’s degree and 8 to 10 years of IT infrastructure, Site Reliability Engineering or related fields experience OR (experience)
  • Bachelor’s degree and 10 to 14 years of IT infrastructure, Site Reliability Engineering or related fields experience OR (experience)
  • Diploma and 14 to 18 years of IT infrastructure, Site Reliability Engineering or related fields experience. (experience)
  • Bachelor’s degree in computer science and engineering preferred, other Engineering field is considered (experience)

Responsibilities

  • Ensure the reliability, scalability, and performance of Amgen’s infrastructure, platforms, and applications. Proactively identify and resolve performance bottlenecks and implement long-term fixes.
  • Continuously evaluate system design and usage to identify opportunities for cost optimization, ensuring infrastructure efficiency without compromising reliability.
  • Drive the adoption of automation and Infrastructure as Code (IaC) across the organization to streamline operations, minimize manual interventions, and enhance scalability.
  • Implement tools and frameworks (such as Terraform, Ansible, or Kubernetes) that increase efficiency and reduce infrastructure costs through optimized resource utilization.
  • Establish standardized operational processes, tools, and frameworks across Amgen’s technology stack to ensure consistency, maintainability, and best-in-class reliability practices.
  • Champion the use of industry standards to optimize performance and increase operational efficiency.
  • Implement and maintain comprehensive monitoring, alerting, and logging systems to detect issues early and ensure rapid incident response.
  • Lead the incident management process to minimize downtime, conduct root cause analysis, and implement preventive measures to avoid future occurrences.
  • Foster a culture of continuous improvement by leveraging data from incidents and performance monitoring.
  • Partner with software engineering, and IT teams to integrate reliability, performance optimization, and cost-saving strategies throughout the development lifecycle.
  • Act as a SME for SRE principles and advocate for best practices for assigned Projects.
  • Execute capacity planning processes to support future growth, performance, and cost management.
  • Maintain disaster recovery strategies to ensure system reliability and minimize downtime in the event of failures.

Target Your Resume for "Principal Site Reliability Engineer" , Amgen

Get personalized recommendations to optimize your resume specifically for Principal Site Reliability Engineer. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "Principal Site Reliability Engineer" , Amgen

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Software EngineeringCloudFull StackInformation SystemsTechnology

Related Jobs You May Like

No related jobs found at the moment.