Sr. Software Development Engineer, ML Infrastructure Team

Amazon logo

Amazon

full-time

Posted: October 15, 2025

Number of Vacancies: 1

Job Description

Want to help drive the success of Machine Learning technologies at AWS? Do you have the skills and motivation to build automation that supports the success of peer teams? We want to talk to you! We seek a Software Development Engineer for the Machine Learning (ML) Infrastructure team to build the tools that are used to guarantee top performance of AWS ML and High Performance Computing (HPC) technologies developed by our organization. Bring your exceptional knowledge of CI/CD automation, ML and HPC benchmarks and applications to bear on the cutting-edge software we develop. Join us as we expand the AWS offerings for AI, including Trainium, Neuron and the Elastic Fabric Adapter (EFA). Key job responsibilitiesBe an autonomous engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale. Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time. Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads. Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders. Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach reach customers. Manage the complexity of infrastructure that covers many instance types, software stacks, Linux operating systems, cutting-edge releases and make it easy to evolve. A day in the lifeYou use Typescript and the CDK to ensure all infrastructure setup is code (IoC), reviewed and committed to automated pipelines. You find innovative ways to schedule work using SLURM and Active Directory, supporting multiple teams of developers while keeping cluster costs down. You write crisp designs for your projects, communicating clearly to your peers what you will build. About the teamWe are part of Annapurna Labs, a subsidiary in AWS that builds software and hardware that make ML on EC2 work. Our organization is a dedicated group of innovators that have invented new networks, new silicon, new software suites, and combined those to entice customers to move immense ML and HPC workloads to the cloud. The ML Infrastructure team is laser focused on making AWS the best and most cost-effective place for customers to do AI at scale.

Locations

  • United States, WA, Seattle, Seattle, WA, United States

Salary

Salary not disclosed

Estimated Salary Rangehigh confidence

185,000 - 285,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

  • - 4+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experienceintermediate
  • - Experience with CI/CD pipelines build processesintermediate
  • - Experience using Linux, demonstrating proficiency with associated tools or languagesintermediate
  • - 5+ years of non-internship professional software development experienceintermediate
  • - Experience coding in Python, Typescript, CDKintermediate

Required Qualifications

  • - 4+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience (experience, 4 years)
  • - Experience with CI/CD pipelines build processes (experience)
  • - Experience using Linux, demonstrating proficiency with associated tools or languages (experience)
  • - 5+ years of non-internship professional software development experience (experience, 5 years)
  • - Experience coding in Python, Typescript, CDK (experience)

Preferred Qualifications

  • - 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience (experience, 3 years)
  • - Bachelor's degree or above in computer science or equivalent (degree in above in computer science or equivalent)
  • Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $151,300/year in our lowest geographic market up to $261,500/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site. (experience)

Responsibilities

  • Be an autonomous engineer on a team that builds and maintains the infrastructure that monitors and reports on functionality and performance of massive testing workloads run at scale. Use internal Amazon CI/CD tools, Linux, and public AWS products to automate the delivery of our software to customers, saving developer time. Write Python code that effortlessly spools up large clusters and runs benchmarks and applications for ML and HPC workloads. Use AWS Managed Grafana and Athena to digest the massive amount of performance data generated by these workloads and create dashboards for developers and stakeholders. Invent automatic mechanisms to alert developers to functional and performance regressions so they never reach reach customers. Manage the complexity of infrastructure that covers many instance types, software stacks, Linux operating systems, cutting-edge releases and make it easy to evolve.

Target Your Resume for "Sr. Software Development Engineer, ML Infrastructure Team"

Get personalized recommendations to optimize your resume specifically for Sr. Software Development Engineer, ML Infrastructure Team. Our AI analyzes job requirements and tailors your resume to maximize your chances.

Keyword optimization
Skills matching
Experience alignment

Check Your ATS Score for "Sr. Software Development Engineer, ML Infrastructure Team"

Find out how well your resume matches this job's requirements. Our Applicant Tracking System (ATS) analyzer scores your resume based on keywords, skills, and format compatibility.

Instant analysis
Detailed feedback
Improvement tips

Documents

Tags & Categories

aws.team-annapurna-labsaws.team-utility-computingSoftware Development