[Remote] Senior Systems Engineer, Storage - DGX Cloud

Remote, USA Full-time Posted 2026-06-16

Note: The job is a remote job and is open to candidates in USA. NVIDIA is a leading technology company known for its innovative GPU cloud services. The Senior Systems Engineer will design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, ensuring reliability and performance through automation and observability.

Responsibilities

Design, deploy, and operate solutions on Kubernetes for large-scale storage and data platforms, including the manifests, Helm charts, and operators that run them
Build tools, services, and automation that improve the lifecycle of storage and data systems – from provisioning and configuration through deployment, scaling, and day-2 operations
Develop and operate telemetry and observability for production systems – metrics, logging, tracing, dashboards, and alerting – so that system health, availability, and latency are measurable and actionable
Apply strong analytical troubleshooting skills to diagnose and resolve complex issues across distributed, containerized infrastructure
Work closely with peers and partner teams to improve the lifecycle of services, from inception and design through deployment, operation, and refinement
Scale systems sustainably through automation, infrastructure-as-code, and CI/CD, and evolve systems by pushing for changes that improve reliability and velocity
Support services before they go live through activities such as deployment automation, capacity planning, and launch and readiness reviews
Practice sustainable incident response and postmortems, and participate in an on-call rotation to support production systems

Skills

BS degree (or equivalent experience) in Computer Science or related technical field involving coding
12+ years of practical experience
Hands-on experience with Kubernetes – deploying, configuring, and operating workloads and solutions on Kubernetes in production
Experience building tools and services for storage, data, or platform infrastructure, with solid software design fundamentals (algorithms, data structures, complexity analysis) on large-scale Linux-based systems
Experience building and operating telemetry and observability using tools such as Prometheus, InfluxDB, Grafana, and the Elastic stack
Strong analytical troubleshooting skills with a systematic, root-cause-driven approach to identifying and resolving complex problems
Proficiency in one or more of the following: Python, Go, or Java
Good knowledge of infrastructure configuration management and infrastructure-as-code tools such as Ansible, Chef, Puppet, ArgoCD, Git Pipelines, and Terraform
Customer-first mindset with a focus on customer satisfaction and a passion for ensuring customer success
Experience with Git, code review, pipelines, and CI/CD
Experience using or running large private and public cloud systems based on Kubernetes, OpenStack, and Docker
Interest in crafting, analyzing, and fixing large-scale distributed systems, with strong debugging skills and a systematic problem-solving approach
Experience designing storage- or data-focused tooling and automating their operations at scale
Thrive in collaborative environments and enjoy working with various teams, and are flexible in adapting to different working styles

Benefits

You will also be eligible for equity and [benefits](https://www.nvidia.com/en-us/benefits/).

Company Overview

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. It was founded in 1993, and is headquartered in Santa Clara, California, USA, with a workforce of 10001+ employees. Its website is https://www.nvidia.com.

Company H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships, with 448 in 2026, 1872 in 2025, 1354 in 2024, 976 in 2023, 835 in 2022, 601 in 2021, 529 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply Now

[Remote] Senior Systems Engineer, Storage - DGX Cloud

Similar Jobs

[Remote] Principal Product Manager, Healthcare Payer Strategy

[Remote] Staff Data Scientist

[Remote] Management Consulting Senior Associate (49530)

[Remote] SEO Strategist

[Remote] Senior Account Manager

[Remote] Software Engineer III - Content Tooling (AI Focus)

[Remote] Operations Coordinator, Patient Care Services

[Remote] Salesforce Administrator I

[Remote] Senior GTM Operations Engineer

[Remote] Executive Director, PGS Operations

Online | Customer Service Specialist – Hotel Reservations

Regional Sales, Client Services Manager

Experienced Remote Data Entry Specialist – Virtual Office Operations

Part Time Remote Licensed Talk Therapist - Fee For Service

Experienced Home-Based Chat Support Representative – Immediate Start, No Experience Required

Experienced Customer Support Representative – Distributed Aetna Customer Support Remote

Senior Solutions Consultant, Sales

Contracted In-Home Occupational Therapist

Senior Machine Learning Scientist (USA Remote)

Salesforce CRM Mobile Developer / Salesforce Mobile Solutions Developer