As a Principal Software Engineering Manager - Azure Storage, you will lead strategic initiatives to optimize fleet health and reduce offline capacity across hyperscale environments. This role is pivotal in driving intelligent solutions that improve reliability, minimize operational overhead, and enable scalable Artificial Intelligence (AI) and Machine Learning (ML) workloads for customers like Open Artificial Intelligence (Open AI), Temu, and others. You’ll partner across engineering, product, and industry teams to modernize data infrastructure, accelerate digital transformation, and deliver measurable impact in sustainability and efficiency.The Principal Software Engineering Manager - Azure Storage, will lead a team of developers focused on scaling and optimizing one of the world’s largest storage server fleets. Your mission is to reduce offline capacity and manual operational burden through intelligent automation and AI-driven solutions. This high-impact role offers visibility at the VP level, opportunities to shape fleet strategy, and the flexibility of working across time zones in a collaborative, remote-friendly environment.Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Locations
Multiple Locations, Multiple Locations, United States, Multiple Locations, Multiple Locations, United States (Remote)
Redmond, Washington, United States, Redmond, Washington, United States (Remote)
San Francisco, California, United States, San Francisco, California, United States (Remote)
Mountain View, California, United States, Mountain View, California, United States (Remote)
San Jose, California, United States, San Jose, California, United States (Remote)
Salary
Salary not disclosed
Required Qualifications
Bachelor's Degree in Computer Science, or related technical discipline AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Python OR equivalent experience. (degree)
OR equivalent experience. (degree)
2+ year(s) of deep understanding of server hardware architecture and fleet-level hardware lifecycle management, including diagnostics, telemetry, and failure mitigation. (degree)
2+ years of people management experience. (degree)
3+ years of technical background in cloud infrastructure, storage systems. Preferably within hyperscale environments. (degree)
3+ years of demonstrated ability to plan and execute complex projects, including setting priorities, managing timelines, and delivering results across cross-functional teams. (degree)
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: (degree)
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter. (degree)
Experience with AI/ML-driven automation for anomaly detection, predictive maintenance, or system optimization. (degree)
Experience communication and stakeholder management skills, with a proven ability to engage Vice President (VP)-level leadership and influence technical roadmaps. (degree)
Experience driving engineering excellence, including service quality, reliability, and operational readiness. (degree)
Demonstrated comfort working across time zones in a remote-friendly, globally distributed team environment. (degree)
Bachelors (degree)
Preferred Qualifications
Experience with AI/ML-driven automation for anomaly detection, predictive maintenance, or system optimization. (degree)
Experience communication and stakeholder management skills, with a proven ability to engage Vice President (VP)-level leadership and influence technical roadmaps. (degree)
Experience driving engineering excellence, including service quality, reliability, and operational readiness. (degree)
Demonstrated comfort working across time zones in a remote-friendly, globally distributed team environment. (degree)
Responsibilities
Lead and manage a high-performing engineering team focused on scaling and optimizing Azure Storage’s global fleet infrastructure.
Drive planning and execution of team deliverables, ensuring alignment with partner teams, business goals, technical strategy, and service-level objectives.
Develop and deliver scalable features that reduce offline capacity, improve fleet reliability, and minimize manual operational overhead and risk.
Leverage AI/ML to build intelligent automation for anomaly detection, predictive maintenance, and fleet health optimization.
Engage with senior leadership, including VP-level stakeholders, to influence roadmap priorities and communicate impact.
Guides team to drive multiple group's project plans, release plans, and work items in coordination with appropriate stakeholders (e.g., project managers).
Guides team and acts as an expert for Designated Responsible Individual (DRI) and monitors other engineers across product lines, working on call to monitor system/product/service for degradation, downtime, or interruptions.