SDE I - Systems, Runtime, and ML Infrastructure (AWS Custom Silicon), Annapurna Labs

Amazon

full-time

Posted: April 14, 2025

Number of Vacancies: 1

Job Description

At AWS, we're pioneering the future of cloud computing and AI acceleration through innovative hardware-software co-design. Our teams within Annapurna Labs and AWS AI are creating the foundation for next-generation cloud infrastructure that powers thousands of customers worldwide, from cutting-edge startups to global enterprises.We operate at an unprecedented scale, designing custom silicon chips, advanced networking solutions, and ML accelerators that were unimaginable just a few years ago. Our work spans from the lowest levels of hardware abstraction to high-performance distributed training systems, creating unique opportunities for early-career engineers to make significant impact across multiple domains.Key job responsibilities- Develop and optimize software for custom hardware and ML infrastructure- Collaborate with hardware teams to understand and leverage chip architecture- Implement and improve networking, runtime, and system-level software- Assist in building and maintaining tools for profiling, monitoring, and debugging ML workloads- Contribute to the development of open-source ML frameworks and infrastructure projects- Participate in code reviews and implement best practices for software development- Learn and apply new technologies to solve complex engineering challengesAbout the teamCandidates will be routed to specific teams based on their interests and our current needs during the application process:- The Elastic Network Adapter (ENA) team revolutionizes EC2 core networking, enabling enhanced networking capabilities across AWS's most critical compute instances. Here, you'll work with networking protocols and high-performance drivers that power millions of cloud workloads.- Our AWS Neuron SDK team develops the complete software stack for custom ML accelerators (Inferentia and Trainium), democratizing access to AI infrastructure. This team bridges the gap between popular ML frameworks and custom hardware.- The Machine Learning Server Software team maintains and optimizes the world's most advanced ML servers, focusing on system-level software that ensures peak performance of AI workloads. While we don't work directly on ML algorithms, we build the critical infrastructure that makes ML possible at scale.- The SoC Hardware Abstraction Layer (HAL) team works at the intersection of hardware and software, developing the crucial middleware that manages our custom silicon chips. This team ensures our innovative hardware designs translate into reliable, high-performance solutions.

Locations

United States, TX, Austin, Austin, TX, United States
United States, WA, Seattle, Seattle, WA, United States
United States, CA, Cupertino, Cupertino, CA, United States

Salary

Salary not disclosed

Estimated Salary Rangehigh confidence

135,000 - 195,000 USD / yearly

Source: ai estimated

* This is an estimated range based on market data and may vary based on experience and qualifications.

Skills Required

- Internship or project experience related to systems programming, networking, or MLintermediate

Required Qualifications

- To qualify, applicants should have earned (or expect to earn) a Bachelor’s or Master’s degree between December 2022 to September 2025. (degree in expect to earn)
- Strong programming skills in C/C++ or Python, with solid understanding of data structures and algorithms (experience)
- Understanding of computer architecture, operating systems, and Linux environments (experience)
- Internship or project experience related to systems programming, networking, or ML (experience)

Preferred Qualifications

- Familiarity with version control systems (e.g., Git) and software development methodologies (experience)
- Knowledge of ML concepts or frameworks (e.g., PyTorch, TensorFlow) (experience)
- Interest in open-source development or contributions to technical communities (experience)
Los Angeles County applicants: Job duties for this position include: work safely and cooperatively with other employees, supervisors, and staff; adhere to standards of excellence despite stressful conditions; communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and follow all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position. These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records. (experience)
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $99,500/year in our lowest geographic market up to $200,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site. (experience)

Responsibilities

- Develop and optimize software for custom hardware and ML infrastructure
- Collaborate with hardware teams to understand and leverage chip architecture
- Implement and improve networking, runtime, and system-level software
- Assist in building and maintaining tools for profiling, monitoring, and debugging ML workloads
- Contribute to the development of open-source ML frameworks and infrastructure projects
- Participate in code reviews and implement best practices for software development
- Learn and apply new technologies to solve complex engineering challenges
About the team
Candidates will be routed to specific teams based on their interests and our current needs during the application process:
- The Elastic Network Adapter (ENA) team revolutionizes EC2 core networking, enabling enhanced networking capabilities across AWS's most critical compute instances. Here, you'll work with networking protocols and high-performance drivers that power millions of cloud workloads.
- Our AWS Neuron SDK team develops the complete software stack for custom ML accelerators (Inferentia and Trainium), democratizing access to AI infrastructure. This team bridges the gap between popular ML frameworks and custom hardware.
- The Machine Learning Server Software team maintains and optimizes the world's most advanced ML servers, focusing on system-level software that ensures peak performance of AI workloads. While we don't work directly on ML algorithms, we build the critical infrastructure that makes ML possible at scale.
- The SoC Hardware Abstraction Layer (HAL) team works at the intersection of hardware and software, developing the crucial middleware that manages our custom silicon chips. This team ensures our innovative hardware designs translate into reliable, high-performance solutions.

Target Your Resume for "SDE I - Systems, Runtime, and ML Infrastructure (AWS Custom Silicon), Annapurna Labs"

Get personalized recommendations to optimize your resume specifically for SDE I - Systems, Runtime, and ML Infrastructure (AWS Custom Silicon), Annapurna Labs. Our AI analyzes job requirements and tailors your resume to maximize your chances.

Keyword optimization

Skills matching

Experience alignment

Check Your ATS Score for "SDE I - Systems, Runtime, and ML Infrastructure (AWS Custom Silicon), Annapurna Labs"

Find out how well your resume matches this job's requirements. Our Applicant Tracking System (ATS) analyzer scores your resume based on keywords, skills, and format compatibility.

Instant analysis

Detailed feedback

Improvement tips

Documents

Tags & Categories

aws.team-annapurna-labsaws.team-utility-computingamazon.artificial-intelligenceSoftware Development

Privacy Terms & Conditions About Us Refund Policy Recruiter Login Sitemap