AI-enhanced for better readability
Senior Systems Engineer Software
Source: remoteok
About the Role
Join the NVIDIA Cloud Native Engineering (NVCNE) group's backend software team! As a Cloud Platform Software Engineer, you will contribute to a software platform supporting the lifecycle of Artificial Intelligence (AI) super compute infrastructure on Kubernetes. You will work alongside architects, designers, frontend engineers, and SREs to enable AI services across the cloud. You will own your code from development to production, supporting SRE teams and collaborating with internal product teams on sophisticated distributed systems problems at scale. You will be encouraged to foster NVIDIA’s approach to Cloud Native development and especially Kubernetes.
Responsibilities
- Develop software systems to support large-scale deployments of cloud infrastructure
- Design, develop, and distribute APIs to support Infrastructure as Code (IaC) automation and deployment workflows
- Contribute to multiple source code projects to fulfill NVIDIA requirements with software services
- Work and collaborate with engineering managers, architects, designers, and frontend engineers to deliver high-quality software
- Automate the validation of software solutions with unit and integration tests
- Innovate with other engineers on proposed designs and product direction
- Openly share successes and failures in a no-blame environment
Requirements
- BS in Computer Science, Information Systems, Computer Engineering (or equivalent experience) and at least 12 years of overall experience
- 5-7 years of proven experience in large-scale software development
- Experience building and delivering services on Kubernetes
- Proficiency with cloud-native infrastructure (AWS, GCP, Azure, OCI)
- Collaborated with teams to write software to support cloud services at scale
- Ability to troubleshoot issues across multiple layers: infrastructure, Kubernetes, application runtime
- Strong proficiency in Golang for building Kubernetes operators, controllers, and custom tooling
- Experience designing and managing Kubernetes Custom Resource Definitions (CRDs)
- Knowledge of managed Kubernetes services and scaling strategies across cloud and on-prem environments
- Experience developing auto-scaling infrastructure components and incident response and root cause analysis
Ways to Stand Out
- Experience with Kubernetes Cluster API, Terraform, CSP API, and other infrastructure tooling
- Background with using and contributing to open-source projects
- Solid experience with Kustomize, or other Kubernetes packaging tools
- Capable of refactoring software to run in systems such as Kubernetes
- Ability to discuss and work with CSI, CNI, and CRI as well as familiarity with the CNCF and the tooling across the ecosystem
Compensation
- Base salary range: $224,000 - $356,500 USD (determined by location, experience, and pay of employees in similar positions)
- Eligibility for equity and benefits
Additional Information
- Applications accepted at least until November 4, 2025.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.
- Learn more about NVIDIA: https://www.nvidia.com/