Microsoft
Member of Technical Staff, Hardware Health - MAI Superintelligence Team
Found: January 9, 2026
This role is based in London, United Kingdom, with a focus on maintaining the reliability and performance of AI training infrastructures.
Responsibilities:
- Design and implement advanced ROCE transport and congestion control.
- Plan fabric architecture and network scaling strategies.
- Engage in telemetry, observability, and automated troubleshooting.
- Collaborate with leading network designers and conduct performance benchmarking.
Qualifications:
- Bachelor's Degree in Computer Science or related field with 6+ years of technical engineering experience.
- Proficiency in programming languages such as C, C++, C#, Java, JavaScript, or Python.
Work Environment:
4 days a week in-office with less than 25% travel expected.