RESUME AND JOB

LLM Ops Engineer

Apple

LLM Ops Engineer

Apple

full-timePosted: Jul 29, 2025

Job Description

We work on Apple scale opportunities and challenges. We are engineers at heart. We like solving technical problems. We believe a good engineer has the curiosity to dig into inner workings of technology and is always experimenting, reading and in constant learning mode. If you are a software engineer with passion to code and dig deeper into any technology, love knowing the internals, fascinated by distributed systems architecture, we want to hear from you. We are seeking a highly skilled LLM Ops and ML Ops Engineer to lead the deployment, scaling, monitoring, and optimization of large language models (LLMs) across diverse environments. This role is critical to ensuring our machine learning systems are production-ready, high-performing, and resilient. The ideal candidate will have deep expertise in Python programming / Go Programming, a comprehensive understanding of LLM internals, and hands-on experience with various inference engines and deployment strategies. The person should be capable of exhibiting deftness to balance multiple simultaneous competing priorities and deliver solutions in a timely manner. The person should be able to understand complex architectures and be comfortable working with multiple teams KEY RESPONSIBILITIES: - Design and build scalable infrastructure for fine-tuning, and deploying large language models. - Develop and optimize inference pipelines using popular frameworks and engines (e.g. TensorRT, vLLM, Triton Inference Server). - Implement observability solutions for model performance, latency, throughput, GPU/TPU utilization, and memory efficiency. - Own the end-to-end lifecycle of LLMs in production—from experimentation to continuous integration and continuous deployment (CI/CD). - Collaborate with research scientists, ML engineers, and backend teams to operationalize groundbreaking LLM architectures. - Automate and harden model deployment workflows using Python, Kubernetes, Containers and orchestration tools like Argo Workflows and GitOps. - Design reproducible model packaging, versioning, and rollback strategies for large-scale serving. - Stay current with advances in LLM inference acceleration, quantization, distillation, and model compilation techniques (e.g., GGUF, AWQ, FP8).

Locations

Hyderabad, Telangana, India

Salary

Skills Required

curiosityintermediate
experimentingintermediate
readingintermediate
constant learningintermediate
passion to codeintermediate
dig deeper into technologyintermediate
knowing internalsintermediate
fascinated by distributed systems architectureintermediate
Python programmingintermediate
Go Programmingintermediate
understanding of LLM internalsintermediate
experience with inference enginesintermediate
deployment strategiesintermediate
balance multiple simultaneous competing prioritiesintermediate
deliver solutions in a timely mannerintermediate
understand complex architecturesintermediate
working with multiple teamsintermediate
design scalable infrastructureintermediate
fine-tuning large language modelsintermediate
deploying large language modelsintermediate
develop inference pipelinesintermediate
optimize inference pipelinesintermediate
TensorRTintermediate
vLLMintermediate
Triton Inference Serverintermediate
implement observability solutionsintermediate
model performance monitoringintermediate
latency monitoringintermediate
throughput monitoringintermediate
GPU utilizationintermediate
TPU utilizationintermediate
memory efficiencyintermediate
end-to-end lifecycle managementintermediate
experimentationintermediate
continuous integrationintermediate
continuous deployment (CI/CD)intermediate
collaborate with research scientistsintermediate
collaborate with ML engineersintermediate
collaborate with backend teamsintermediate
operationalize LLM architecturesintermediate
automate model deployment workflowsintermediate
harden model deployment workflowsintermediate
Kubernetesintermediate
Containersintermediate
orchestration toolsintermediate
Argo Workflowsintermediate
GitOpsintermediate
reproducible model packagingintermediate
model versioningintermediate
rollback strategiesintermediate
large-scale servingintermediate
LLM inference accelerationintermediate
quantizationintermediate
distillationintermediate
model compilation techniquesintermediate
GGUFintermediate
AWQintermediate
FP8intermediate

Required Qualifications

5+ years of experience in LLM/ML Ops, DevOps, or infrastructure engineering with a focus on machine learning systems. (experience, 5 years)
Advance level proficiency in Python/Go, with ability to write clean, performant, and maintainable production code. (experience)
Deep understanding of transformer architectures, LLM tokenization, attention mechanisms, memory management, and batching strategies. (experience)
Proven experience deploying and optimizing LLMs using multiple inference engines. (experience)
Strong background in containerization and orchestration (Kubernetes, Helm). (experience)
Familiarity with monitoring tools (e.g., Prometheus, Grafana), logging frameworks, and performance profiling. (experience)

Preferred Qualifications

Experience integrating LLMs into micro-services or edge inference platforms. (experience)
Experience with Ray distributed inference (experience)
Hands-on with quantization libraries (experience)
Contributions to open-source ML infrastructure or LLM optimization tools. (experience)
Familiarity with cloud platforms (AWS, GCP) and infrastructure-as-code (Terraform). (experience)
Exposure to secure and compliant model deployment workflows (experience)

Responsibilities

We are seeking a highly skilled LLM Ops and ML Ops Engineer to lead the deployment, scaling, monitoring, and optimization of large language models (LLMs) across diverse environments. This role is critical to ensuring our machine learning systems are production-ready, high-performing, and resilient. The ideal candidate will have deep expertise in Python programming / Go Programming, a comprehensive understanding of LLM internals, and hands-on experience with various inference engines and deployment strategies. The person should be capable of exhibiting deftness to balance multiple simultaneous competing priorities and deliver solutions in a timely manner. The person should be able to understand complex architectures and be comfortable working with multiple teams
KEY RESPONSIBILITIES:
- Design and build scalable infrastructure for fine-tuning, and deploying large language models.
- Develop and optimize inference pipelines using popular frameworks and engines (e.g. TensorRT, vLLM, Triton Inference Server).
- Implement observability solutions for model performance, latency, throughput, GPU/TPU utilization, and memory efficiency.
- Own the end-to-end lifecycle of LLMs in production—from experimentation to continuous integration and continuous deployment (CI/CD).
- Collaborate with research scientists, ML engineers, and backend teams to operationalize groundbreaking LLM architectures.
- Automate and harden model deployment workflows using Python, Kubernetes, Containers and orchestration tools like Argo Workflows and GitOps.
- Design reproducible model packaging, versioning, and rollback strategies for large-scale serving.
- Stay current with advances in LLM inference acceleration, quantization, distillation, and model compilation techniques (e.g., GGUF, AWQ, FP8).

Target Your Resume for "LLM Ops Engineer" , Apple

Get personalized recommendations to optimize your resume specifically for LLM Ops Engineer. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "LLM Ops Engineer" , Apple

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Hardware

Answer 10 quick questions to check your fit for LLM Ops Engineer @ Apple.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap

LLM Ops Engineer

Apple

LLM Ops Engineer

Apple

full-timePosted: Jul 29, 2025

Job Description

Locations

Hyderabad, Telangana, India

Salary

Skills Required

curiosityintermediate
experimentingintermediate
readingintermediate
constant learningintermediate
passion to codeintermediate
dig deeper into technologyintermediate
knowing internalsintermediate
fascinated by distributed systems architectureintermediate
Python programmingintermediate
Go Programmingintermediate
understanding of LLM internalsintermediate
experience with inference enginesintermediate
deployment strategiesintermediate
balance multiple simultaneous competing prioritiesintermediate
deliver solutions in a timely mannerintermediate
understand complex architecturesintermediate
working with multiple teamsintermediate
design scalable infrastructureintermediate
fine-tuning large language modelsintermediate
deploying large language modelsintermediate
develop inference pipelinesintermediate
optimize inference pipelinesintermediate
TensorRTintermediate
vLLMintermediate
Triton Inference Serverintermediate
implement observability solutionsintermediate
model performance monitoringintermediate
latency monitoringintermediate
throughput monitoringintermediate
GPU utilizationintermediate
TPU utilizationintermediate
memory efficiencyintermediate
end-to-end lifecycle managementintermediate
experimentationintermediate
continuous integrationintermediate
continuous deployment (CI/CD)intermediate
collaborate with research scientistsintermediate
collaborate with ML engineersintermediate
collaborate with backend teamsintermediate
operationalize LLM architecturesintermediate
automate model deployment workflowsintermediate
harden model deployment workflowsintermediate
Kubernetesintermediate
Containersintermediate
orchestration toolsintermediate
Argo Workflowsintermediate
GitOpsintermediate
reproducible model packagingintermediate
model versioningintermediate
rollback strategiesintermediate
large-scale servingintermediate
LLM inference accelerationintermediate
quantizationintermediate
distillationintermediate
model compilation techniquesintermediate
GGUFintermediate
AWQintermediate
FP8intermediate

Required Qualifications

5+ years of experience in LLM/ML Ops, DevOps, or infrastructure engineering with a focus on machine learning systems. (experience, 5 years)
Advance level proficiency in Python/Go, with ability to write clean, performant, and maintainable production code. (experience)
Deep understanding of transformer architectures, LLM tokenization, attention mechanisms, memory management, and batching strategies. (experience)
Proven experience deploying and optimizing LLMs using multiple inference engines. (experience)
Strong background in containerization and orchestration (Kubernetes, Helm). (experience)
Familiarity with monitoring tools (e.g., Prometheus, Grafana), logging frameworks, and performance profiling. (experience)

Preferred Qualifications

Experience integrating LLMs into micro-services or edge inference platforms. (experience)
Experience with Ray distributed inference (experience)
Hands-on with quantization libraries (experience)
Contributions to open-source ML infrastructure or LLM optimization tools. (experience)
Familiarity with cloud platforms (AWS, GCP) and infrastructure-as-code (Terraform). (experience)
Exposure to secure and compliant model deployment workflows (experience)

Responsibilities

We are seeking a highly skilled LLM Ops and ML Ops Engineer to lead the deployment, scaling, monitoring, and optimization of large language models (LLMs) across diverse environments. This role is critical to ensuring our machine learning systems are production-ready, high-performing, and resilient. The ideal candidate will have deep expertise in Python programming / Go Programming, a comprehensive understanding of LLM internals, and hands-on experience with various inference engines and deployment strategies. The person should be capable of exhibiting deftness to balance multiple simultaneous competing priorities and deliver solutions in a timely manner. The person should be able to understand complex architectures and be comfortable working with multiple teams
KEY RESPONSIBILITIES:
- Design and build scalable infrastructure for fine-tuning, and deploying large language models.
- Develop and optimize inference pipelines using popular frameworks and engines (e.g. TensorRT, vLLM, Triton Inference Server).
- Implement observability solutions for model performance, latency, throughput, GPU/TPU utilization, and memory efficiency.
- Own the end-to-end lifecycle of LLMs in production—from experimentation to continuous integration and continuous deployment (CI/CD).
- Collaborate with research scientists, ML engineers, and backend teams to operationalize groundbreaking LLM architectures.
- Automate and harden model deployment workflows using Python, Kubernetes, Containers and orchestration tools like Argo Workflows and GitOps.
- Design reproducible model packaging, versioning, and rollback strategies for large-scale serving.
- Stay current with advances in LLM inference acceleration, quantization, distillation, and model compilation techniques (e.g., GGUF, AWQ, FP8).

Target Your Resume for "LLM Ops Engineer" , Apple

Get personalized recommendations to optimize your resume specifically for LLM Ops Engineer. Takes only 15 seconds!

AI-powered keyword optimization

Skills matching & gap analysis

Experience alignment suggestions

Check Your ATS Score for "LLM Ops Engineer" , Apple

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check

Keyword optimization analysis

Skill matching & gap identification

Format & readability score

Tags & Categories

Hardware

Answer 10 quick questions to check your fit for LLM Ops Engineer @ Apple.

10 Questions

~2 Minutes

Instant Score

Related Books and Jobs

No related jobs found at the moment.

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap