Nvidia

Site Reliability Engineer - Hardware Infrastructure

US, CA, Santa Clara

Found: January 16, 2026

⚠️ This job posting is no longer active and may not be accepting applications. Browse similar live jobs below, or see all current Nvidia jobs.

This role is based in Santa Clara, CA.

Compensation:

$168,000 - $333,500/year

Responsibilities:

Develop and support guidelines for incident management, planned maintenance, and blameless postmortems.
Assist teams in responding to high severity incidents and driving root cause analysis.
Define reliability and supportability metrics, Service Level Objectives, and error budgets.
Apply automation and Generative AI/Agentic solutions to enhance customer support.
Guide teams on establishing sustainable on-call and operational standards.

Requirements:

Degree in Computer Science or related field, or equivalent experience.
8+ years of experience in SRE, DevOps, or Production Engineering.
Strong understanding of SRE principles and experience with fault-tolerant systems.
Experience in Python, Go, Perl, or Ruby.
Hands-on experience with observability platforms like Prometheus, Grafana.

Get jobs like this in your inbox daily

Fresh FAANG jobs, every day, filtered for your role and location.