RESUME AND JOB

Production Engineer

IBM

Production Engineer

IBM

full-timePosted: Dec 11, 2025

Job Description

Production Engineer

📋 Job Overview

The Production Engineer at IBM will be responsible for scaling, automating, and optimizing multi-cloud platform infrastructure across AWS, GCP, and Azure. The role involves designing, implementing, and operating highly available systems using Kubernetes to support mission-critical applications, ensuring reliability, automation, and operational excellence.

📍 Location: BANGALORE, IN (Remote/Hybrid)

💼 Career Level: Professional

🎯 Key Responsibilities

Ensure platform reliability and performance: Monitor, troubleshoot, and optimize production systems running on Kubernetes (EKS, GKE, AKS)
Automate operations: Develop and maintain automation for infrastructure provisioning, scaling, and incident response
Incident response & on-call support: Participate in on-call rotations to quickly detect, mitigate, and resolve production incidents
Kubernetes upgrades & management: Own and drive Kubernetes version upgrades, node pool scaling, and security patches
Observability & monitoring: Implement and refine observability tools (Datadog, Prometheus, Splunk, Victoria Metric etc.) for proactive monitoring and alerting
Infrastructure as Code (IaC): Manage infrastructure using Terraform, Terragrunt, Helm, and Kubernetes manifests
CI/CD & release automation: Build, maintain, and improve CI/CD pipelines using GitHub Actions, ArgoCD, and related tooling to streamline application delivery and platform updates
Cross-functional collaboration: Work closely with developers, SREs, and other teams to improve platform stability
Performance tuning: Analyze and optimize cloud and containerized workloads for cost efficiency and high availability
Security & compliance: Ensure platform security best practices, incident response, and compliance adherence

✅ Required Qualifications

Strong expertise in Kubernetes (EKS, GKE, AKS) and container orchestration
Experience with AWS, GCP, or Azure, particularly in managing large-scale cloud infrastructure
Proficiency in Terraform, Helm, and Infrastructure as Code (IaC)
Strong understanding of Linux systems, networking, and security best practices
Experience with monitoring & logging tools (Datadog, Splunk, Prometheus, Grafana, Victoria Metrics, etc.)
Hands-on experience with automation & scripting (Python, Go, Bash)
Experience in incident management & debugging complex distributed systems
Familiarity with CI/CD pipelines and release automation

🛠️ Required Skills

Kubernetes
EKS
GKE
AKS
AWS
GCP
Azure
Terraform
Helm
Infrastructure as Code (IaC)
Linux
Networking
Security
Datadog
Splunk
Prometheus
Grafana
Victoria Metrics
Python
Go
Bash
Incident management
Debugging
CI/CD
GitHub Actions
ArgoCD
Terragrunt
Kubernetes manifests

🎁 Benefits & Perks

Opportunity to learn and develop career
Encouragement to be courageous and experiment
Continuous trust and support in an inclusive environment
Growth-minded culture with openness to feedback and learning
Opportunity to collaborate and drive exceptional outcomes for customers

Locations

BANGALORE, IN, India (Remote)

Salary

Estimated Salary Rangemedium confidence

2,500,000 - 4,200,000 INR / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Kubernetesintermediate
EKSintermediate
GKEintermediate
AKSintermediate
AWSintermediate
GCPintermediate
Azureintermediate
Terraformintermediate
Helmintermediate
Infrastructure as Code (IaC)intermediate
Linuxintermediate
Networkingintermediate
Securityintermediate
Datadogintermediate
Splunkintermediate
Prometheusintermediate
Grafanaintermediate
Victoria Metricsintermediate
Pythonintermediate
Gointermediate
Bashintermediate
Incident managementintermediate
Debuggingintermediate
CI/CDintermediate
GitHub Actionsintermediate
ArgoCDintermediate
Terragruntintermediate
Kubernetes manifestsintermediate

Required Qualifications

Strong expertise in Kubernetes (EKS, GKE, AKS) and container orchestration (experience)
Experience with AWS, GCP, or Azure, particularly in managing large-scale cloud infrastructure (experience)
Proficiency in Terraform, Helm, and Infrastructure as Code (IaC) (experience)
Strong understanding of Linux systems, networking, and security best practices (experience)
Experience with monitoring & logging tools (Datadog, Splunk, Prometheus, Grafana, Victoria Metrics, etc.) (experience)
Hands-on experience with automation & scripting (Python, Go, Bash) (experience)
Experience in incident management & debugging complex distributed systems (experience)
Familiarity with CI/CD pipelines and release automation (experience)

Responsibilities

Ensure platform reliability and performance: Monitor, troubleshoot, and optimize production systems running on Kubernetes (EKS, GKE, AKS)
Automate operations: Develop and maintain automation for infrastructure provisioning, scaling, and incident response
Incident response & on-call support: Participate in on-call rotations to quickly detect, mitigate, and resolve production incidents
Kubernetes upgrades & management: Own and drive Kubernetes version upgrades, node pool scaling, and security patches
Observability & monitoring: Implement and refine observability tools (Datadog, Prometheus, Splunk, Victoria Metric etc.) for proactive monitoring and alerting
Infrastructure as Code (IaC): Manage infrastructure using Terraform, Terragrunt, Helm, and Kubernetes manifests
CI/CD & release automation: Build, maintain, and improve CI/CD pipelines using GitHub Actions, ArgoCD, and related tooling to streamline application delivery and platform updates
Cross-functional collaboration: Work closely with developers, SREs, and other teams to improve platform stability
Performance tuning: Analyze and optimize cloud and containerized workloads for cost efficiency and high availability
Security & compliance: Ensure platform security best practices, incident response, and compliance adherence

Benefits

general: Opportunity to learn and develop career
general: Encouragement to be courageous and experiment
general: Continuous trust and support in an inclusive environment
general: Growth-minded culture with openness to feedback and learning
general: Opportunity to collaborate and drive exceptional outcomes for customers

Target Your Resume for "Production Engineer" , IBM

Get personalized recommendations to optimize your resume specifically for Production Engineer. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Production Engineer" , IBM

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Infrastructure & TechnologyInfrastructure & Technology

Answer 10 quick questions to check your fit for Production Engineer @ IBM.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

Production Engineer

IBM

Production Engineer

IBM

full-timePosted: Dec 11, 2025

Job Description

Production Engineer

📋 Job Overview

📍 Location: BANGALORE, IN (Remote/Hybrid)

💼 Career Level: Professional

🎯 Key Responsibilities

Ensure platform reliability and performance: Monitor, troubleshoot, and optimize production systems running on Kubernetes (EKS, GKE, AKS)
Automate operations: Develop and maintain automation for infrastructure provisioning, scaling, and incident response
Incident response & on-call support: Participate in on-call rotations to quickly detect, mitigate, and resolve production incidents
Kubernetes upgrades & management: Own and drive Kubernetes version upgrades, node pool scaling, and security patches
Observability & monitoring: Implement and refine observability tools (Datadog, Prometheus, Splunk, Victoria Metric etc.) for proactive monitoring and alerting
Infrastructure as Code (IaC): Manage infrastructure using Terraform, Terragrunt, Helm, and Kubernetes manifests
CI/CD & release automation: Build, maintain, and improve CI/CD pipelines using GitHub Actions, ArgoCD, and related tooling to streamline application delivery and platform updates
Cross-functional collaboration: Work closely with developers, SREs, and other teams to improve platform stability
Performance tuning: Analyze and optimize cloud and containerized workloads for cost efficiency and high availability
Security & compliance: Ensure platform security best practices, incident response, and compliance adherence

✅ Required Qualifications

Strong expertise in Kubernetes (EKS, GKE, AKS) and container orchestration
Experience with AWS, GCP, or Azure, particularly in managing large-scale cloud infrastructure
Proficiency in Terraform, Helm, and Infrastructure as Code (IaC)
Strong understanding of Linux systems, networking, and security best practices
Experience with monitoring & logging tools (Datadog, Splunk, Prometheus, Grafana, Victoria Metrics, etc.)
Hands-on experience with automation & scripting (Python, Go, Bash)
Experience in incident management & debugging complex distributed systems
Familiarity with CI/CD pipelines and release automation

🛠️ Required Skills

Kubernetes
EKS
GKE
AKS
AWS
GCP
Azure
Terraform
Helm
Infrastructure as Code (IaC)
Linux
Networking
Security
Datadog
Splunk
Prometheus
Grafana
Victoria Metrics
Python
Go
Bash
Incident management
Debugging
CI/CD
GitHub Actions
ArgoCD
Terragrunt
Kubernetes manifests

🎁 Benefits & Perks

Opportunity to learn and develop career
Encouragement to be courageous and experiment
Continuous trust and support in an inclusive environment
Growth-minded culture with openness to feedback and learning
Opportunity to collaborate and drive exceptional outcomes for customers

Locations

BANGALORE, IN, India (Remote)

Salary

Estimated Salary Rangemedium confidence

2,500,000 - 4,200,000 INR / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

Kubernetesintermediate
EKSintermediate
GKEintermediate
AKSintermediate
AWSintermediate
GCPintermediate
Azureintermediate
Terraformintermediate
Helmintermediate
Infrastructure as Code (IaC)intermediate
Linuxintermediate
Networkingintermediate
Securityintermediate
Datadogintermediate
Splunkintermediate
Prometheusintermediate
Grafanaintermediate
Victoria Metricsintermediate
Pythonintermediate
Gointermediate
Bashintermediate
Incident managementintermediate
Debuggingintermediate
CI/CDintermediate
GitHub Actionsintermediate
ArgoCDintermediate
Terragruntintermediate
Kubernetes manifestsintermediate

Required Qualifications

Strong expertise in Kubernetes (EKS, GKE, AKS) and container orchestration (experience)
Experience with AWS, GCP, or Azure, particularly in managing large-scale cloud infrastructure (experience)
Proficiency in Terraform, Helm, and Infrastructure as Code (IaC) (experience)
Strong understanding of Linux systems, networking, and security best practices (experience)
Experience with monitoring & logging tools (Datadog, Splunk, Prometheus, Grafana, Victoria Metrics, etc.) (experience)
Hands-on experience with automation & scripting (Python, Go, Bash) (experience)
Experience in incident management & debugging complex distributed systems (experience)
Familiarity with CI/CD pipelines and release automation (experience)

Responsibilities

Ensure platform reliability and performance: Monitor, troubleshoot, and optimize production systems running on Kubernetes (EKS, GKE, AKS)
Automate operations: Develop and maintain automation for infrastructure provisioning, scaling, and incident response
Incident response & on-call support: Participate in on-call rotations to quickly detect, mitigate, and resolve production incidents
Kubernetes upgrades & management: Own and drive Kubernetes version upgrades, node pool scaling, and security patches
Observability & monitoring: Implement and refine observability tools (Datadog, Prometheus, Splunk, Victoria Metric etc.) for proactive monitoring and alerting
Infrastructure as Code (IaC): Manage infrastructure using Terraform, Terragrunt, Helm, and Kubernetes manifests
CI/CD & release automation: Build, maintain, and improve CI/CD pipelines using GitHub Actions, ArgoCD, and related tooling to streamline application delivery and platform updates
Cross-functional collaboration: Work closely with developers, SREs, and other teams to improve platform stability
Performance tuning: Analyze and optimize cloud and containerized workloads for cost efficiency and high availability
Security & compliance: Ensure platform security best practices, incident response, and compliance adherence

Benefits

general: Opportunity to learn and develop career
general: Encouragement to be courageous and experiment
general: Continuous trust and support in an inclusive environment
general: Growth-minded culture with openness to feedback and learning
general: Opportunity to collaborate and drive exceptional outcomes for customers

Target Your Resume for "Production Engineer" , IBM

Get personalized recommendations to optimize your resume specifically for Production Engineer. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "Production Engineer" , IBM

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Infrastructure & TechnologyInfrastructure & Technology

Answer 10 quick questions to check your fit for Production Engineer @ IBM.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap