Resume and JobRESUME AND JOB
Apple logo

LLM Ops Engineer

Apple

Software and Technology Jobs

LLM Ops Engineer

full-timePosted: Jul 29, 2025

Job Description

We work on Apple scale opportunities and challenges. We are engineers at heart. We like solving technical problems. We believe a good engineer has the curiosity to dig into inner workings of technology and is always experimenting, reading and in constant learning mode. If you are a software engineer with passion to code and dig deeper into any technology, love knowing the internals, fascinated by distributed systems architecture, we want to hear from you. We are seeking a highly skilled LLM Ops and ML Ops Engineer to lead the deployment, scaling, monitoring, and optimization of large language models (LLMs) across diverse environments. This role is critical to ensuring our machine learning systems are production-ready, high-performing, and resilient. The ideal candidate will have deep expertise in Python programming / Go Programming, a comprehensive understanding of LLM internals, and hands-on experience with various inference engines and deployment strategies. The person should be capable of exhibiting deftness to balance multiple simultaneous competing priorities and deliver solutions in a timely manner. The person should be able to understand complex architectures and be comfortable working with multiple teams KEY RESPONSIBILITIES: - Design and build scalable infrastructure for fine-tuning, and deploying large language models. - Develop and optimize inference pipelines using popular frameworks and engines (e.g. TensorRT, vLLM, Triton Inference Server). - Implement observability solutions for model performance, latency, throughput, GPU/TPU utilization, and memory efficiency. - Own the end-to-end lifecycle of LLMs in production—from experimentation to continuous integration and continuous deployment (CI/CD). - Collaborate with research scientists, ML engineers, and backend teams to operationalize groundbreaking LLM architectures. - Automate and harden model deployment workflows using Python, Kubernetes, Containers and orchestration tools like Argo Workflows and GitOps. - Design reproducible model packaging, versioning, and rollback strategies for large-scale serving. - Stay current with advances in LLM inference acceleration, quantization, distillation, and model compilation techniques (e.g., GGUF, AWQ, FP8).

Locations

  • Hyderabad, Telangana, India

Salary

Skills Required

  • curiosityintermediate
  • experimentingintermediate
  • readingintermediate
  • constant learningintermediate
  • passion to codeintermediate
  • dig deeper into technologyintermediate
  • knowing internalsintermediate
  • fascinated by distributed systems architectureintermediate
  • Python programmingintermediate
  • Go Programmingintermediate
  • understanding of LLM internalsintermediate
  • experience with inference enginesintermediate
  • deployment strategiesintermediate
  • balance multiple simultaneous competing prioritiesintermediate
  • deliver solutions in a timely mannerintermediate
  • understand complex architecturesintermediate
  • working with multiple teamsintermediate
  • design scalable infrastructureintermediate
  • fine-tuning large language modelsintermediate
  • deploying large language modelsintermediate
  • develop inference pipelinesintermediate
  • optimize inference pipelinesintermediate
  • TensorRTintermediate
  • vLLMintermediate
  • Triton Inference Serverintermediate
  • implement observability solutionsintermediate
  • model performance monitoringintermediate
  • latency monitoringintermediate
  • throughput monitoringintermediate
  • GPU utilizationintermediate
  • TPU utilizationintermediate
  • memory efficiencyintermediate
  • end-to-end lifecycle managementintermediate
  • experimentationintermediate
  • continuous integrationintermediate
  • continuous deployment (CI/CD)intermediate
  • collaborate with research scientistsintermediate
  • collaborate with ML engineersintermediate
  • collaborate with backend teamsintermediate
  • operationalize LLM architecturesintermediate
  • automate model deployment workflowsintermediate
  • harden model deployment workflowsintermediate
  • Kubernetesintermediate
  • Containersintermediate
  • orchestration toolsintermediate
  • Argo Workflowsintermediate
  • GitOpsintermediate
  • reproducible model packagingintermediate
  • model versioningintermediate
  • rollback strategiesintermediate
  • large-scale servingintermediate
  • LLM inference accelerationintermediate
  • quantizationintermediate
  • distillationintermediate
  • model compilation techniquesintermediate
  • GGUFintermediate
  • AWQintermediate
  • FP8intermediate

Required Qualifications

  • 5+ years of experience in LLM/ML Ops, DevOps, or infrastructure engineering with a focus on machine learning systems. (experience, 5 years)
  • Advance level proficiency in Python/Go, with ability to write clean, performant, and maintainable production code. (experience)
  • Deep understanding of transformer architectures, LLM tokenization, attention mechanisms, memory management, and batching strategies. (experience)
  • Proven experience deploying and optimizing LLMs using multiple inference engines. (experience)
  • Strong background in containerization and orchestration (Kubernetes, Helm). (experience)
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana), logging frameworks, and performance profiling. (experience)

Preferred Qualifications

  • Experience integrating LLMs into micro-services or edge inference platforms. (experience)
  • Experience with Ray distributed inference (experience)
  • Hands-on with quantization libraries (experience)
  • Contributions to open-source ML infrastructure or LLM optimization tools. (experience)
  • Familiarity with cloud platforms (AWS, GCP) and infrastructure-as-code (Terraform). (experience)
  • Exposure to secure and compliant model deployment workflows (experience)

Responsibilities

  • We are seeking a highly skilled LLM Ops and ML Ops Engineer to lead the deployment, scaling, monitoring, and optimization of large language models (LLMs) across diverse environments. This role is critical to ensuring our machine learning systems are production-ready, high-performing, and resilient. The ideal candidate will have deep expertise in Python programming / Go Programming, a comprehensive understanding of LLM internals, and hands-on experience with various inference engines and deployment strategies. The person should be capable of exhibiting deftness to balance multiple simultaneous competing priorities and deliver solutions in a timely manner. The person should be able to understand complex architectures and be comfortable working with multiple teams
  • KEY RESPONSIBILITIES:
  • - Design and build scalable infrastructure for fine-tuning, and deploying large language models.
  • - Develop and optimize inference pipelines using popular frameworks and engines (e.g. TensorRT, vLLM, Triton Inference Server).
  • - Implement observability solutions for model performance, latency, throughput, GPU/TPU utilization, and memory efficiency.
  • - Own the end-to-end lifecycle of LLMs in production—from experimentation to continuous integration and continuous deployment (CI/CD).
  • - Collaborate with research scientists, ML engineers, and backend teams to operationalize groundbreaking LLM architectures.
  • - Automate and harden model deployment workflows using Python, Kubernetes, Containers and orchestration tools like Argo Workflows and GitOps.
  • - Design reproducible model packaging, versioning, and rollback strategies for large-scale serving.
  • - Stay current with advances in LLM inference acceleration, quantization, distillation, and model compilation techniques (e.g., GGUF, AWQ, FP8).

Target Your Resume for "LLM Ops Engineer" , Apple

Get personalized recommendations to optimize your resume specifically for LLM Ops Engineer. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "LLM Ops Engineer" , Apple

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Hardware

Answer 10 quick questions to check your fit for LLM Ops Engineer @ Apple.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.

Apple logo

LLM Ops Engineer

Apple

Software and Technology Jobs

LLM Ops Engineer

full-timePosted: Jul 29, 2025

Job Description

We work on Apple scale opportunities and challenges. We are engineers at heart. We like solving technical problems. We believe a good engineer has the curiosity to dig into inner workings of technology and is always experimenting, reading and in constant learning mode. If you are a software engineer with passion to code and dig deeper into any technology, love knowing the internals, fascinated by distributed systems architecture, we want to hear from you. We are seeking a highly skilled LLM Ops and ML Ops Engineer to lead the deployment, scaling, monitoring, and optimization of large language models (LLMs) across diverse environments. This role is critical to ensuring our machine learning systems are production-ready, high-performing, and resilient. The ideal candidate will have deep expertise in Python programming / Go Programming, a comprehensive understanding of LLM internals, and hands-on experience with various inference engines and deployment strategies. The person should be capable of exhibiting deftness to balance multiple simultaneous competing priorities and deliver solutions in a timely manner. The person should be able to understand complex architectures and be comfortable working with multiple teams KEY RESPONSIBILITIES: - Design and build scalable infrastructure for fine-tuning, and deploying large language models. - Develop and optimize inference pipelines using popular frameworks and engines (e.g. TensorRT, vLLM, Triton Inference Server). - Implement observability solutions for model performance, latency, throughput, GPU/TPU utilization, and memory efficiency. - Own the end-to-end lifecycle of LLMs in production—from experimentation to continuous integration and continuous deployment (CI/CD). - Collaborate with research scientists, ML engineers, and backend teams to operationalize groundbreaking LLM architectures. - Automate and harden model deployment workflows using Python, Kubernetes, Containers and orchestration tools like Argo Workflows and GitOps. - Design reproducible model packaging, versioning, and rollback strategies for large-scale serving. - Stay current with advances in LLM inference acceleration, quantization, distillation, and model compilation techniques (e.g., GGUF, AWQ, FP8).

Locations

  • Hyderabad, Telangana, India

Salary

Skills Required

  • curiosityintermediate
  • experimentingintermediate
  • readingintermediate
  • constant learningintermediate
  • passion to codeintermediate
  • dig deeper into technologyintermediate
  • knowing internalsintermediate
  • fascinated by distributed systems architectureintermediate
  • Python programmingintermediate
  • Go Programmingintermediate
  • understanding of LLM internalsintermediate
  • experience with inference enginesintermediate
  • deployment strategiesintermediate
  • balance multiple simultaneous competing prioritiesintermediate
  • deliver solutions in a timely mannerintermediate
  • understand complex architecturesintermediate
  • working with multiple teamsintermediate
  • design scalable infrastructureintermediate
  • fine-tuning large language modelsintermediate
  • deploying large language modelsintermediate
  • develop inference pipelinesintermediate
  • optimize inference pipelinesintermediate
  • TensorRTintermediate
  • vLLMintermediate
  • Triton Inference Serverintermediate
  • implement observability solutionsintermediate
  • model performance monitoringintermediate
  • latency monitoringintermediate
  • throughput monitoringintermediate
  • GPU utilizationintermediate
  • TPU utilizationintermediate
  • memory efficiencyintermediate
  • end-to-end lifecycle managementintermediate
  • experimentationintermediate
  • continuous integrationintermediate
  • continuous deployment (CI/CD)intermediate
  • collaborate with research scientistsintermediate
  • collaborate with ML engineersintermediate
  • collaborate with backend teamsintermediate
  • operationalize LLM architecturesintermediate
  • automate model deployment workflowsintermediate
  • harden model deployment workflowsintermediate
  • Kubernetesintermediate
  • Containersintermediate
  • orchestration toolsintermediate
  • Argo Workflowsintermediate
  • GitOpsintermediate
  • reproducible model packagingintermediate
  • model versioningintermediate
  • rollback strategiesintermediate
  • large-scale servingintermediate
  • LLM inference accelerationintermediate
  • quantizationintermediate
  • distillationintermediate
  • model compilation techniquesintermediate
  • GGUFintermediate
  • AWQintermediate
  • FP8intermediate

Required Qualifications

  • 5+ years of experience in LLM/ML Ops, DevOps, or infrastructure engineering with a focus on machine learning systems. (experience, 5 years)
  • Advance level proficiency in Python/Go, with ability to write clean, performant, and maintainable production code. (experience)
  • Deep understanding of transformer architectures, LLM tokenization, attention mechanisms, memory management, and batching strategies. (experience)
  • Proven experience deploying and optimizing LLMs using multiple inference engines. (experience)
  • Strong background in containerization and orchestration (Kubernetes, Helm). (experience)
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana), logging frameworks, and performance profiling. (experience)

Preferred Qualifications

  • Experience integrating LLMs into micro-services or edge inference platforms. (experience)
  • Experience with Ray distributed inference (experience)
  • Hands-on with quantization libraries (experience)
  • Contributions to open-source ML infrastructure or LLM optimization tools. (experience)
  • Familiarity with cloud platforms (AWS, GCP) and infrastructure-as-code (Terraform). (experience)
  • Exposure to secure and compliant model deployment workflows (experience)

Responsibilities

  • We are seeking a highly skilled LLM Ops and ML Ops Engineer to lead the deployment, scaling, monitoring, and optimization of large language models (LLMs) across diverse environments. This role is critical to ensuring our machine learning systems are production-ready, high-performing, and resilient. The ideal candidate will have deep expertise in Python programming / Go Programming, a comprehensive understanding of LLM internals, and hands-on experience with various inference engines and deployment strategies. The person should be capable of exhibiting deftness to balance multiple simultaneous competing priorities and deliver solutions in a timely manner. The person should be able to understand complex architectures and be comfortable working with multiple teams
  • KEY RESPONSIBILITIES:
  • - Design and build scalable infrastructure for fine-tuning, and deploying large language models.
  • - Develop and optimize inference pipelines using popular frameworks and engines (e.g. TensorRT, vLLM, Triton Inference Server).
  • - Implement observability solutions for model performance, latency, throughput, GPU/TPU utilization, and memory efficiency.
  • - Own the end-to-end lifecycle of LLMs in production—from experimentation to continuous integration and continuous deployment (CI/CD).
  • - Collaborate with research scientists, ML engineers, and backend teams to operationalize groundbreaking LLM architectures.
  • - Automate and harden model deployment workflows using Python, Kubernetes, Containers and orchestration tools like Argo Workflows and GitOps.
  • - Design reproducible model packaging, versioning, and rollback strategies for large-scale serving.
  • - Stay current with advances in LLM inference acceleration, quantization, distillation, and model compilation techniques (e.g., GGUF, AWQ, FP8).

Target Your Resume for "LLM Ops Engineer" , Apple

Get personalized recommendations to optimize your resume specifically for LLM Ops Engineer. Takes only 15 seconds!

AI-powered keyword optimization
Skills matching & gap analysis
Experience alignment suggestions

Check Your ATS Score for "LLM Ops Engineer" , Apple

Find out how well your resume matches this job's requirements. Get comprehensive analysis including ATS compatibility, keyword matching, skill gaps, and personalized recommendations.

ATS compatibility check
Keyword optimization analysis
Skill matching & gap identification
Format & readability score

Tags & Categories

Hardware

Answer 10 quick questions to check your fit for LLM Ops Engineer @ Apple.

Quiz Challenge
10 Questions
~2 Minutes
Instant Score

Related Books and Jobs

No related jobs found at the moment.