Back to all jobs

Principal Software Engineer Dynamo

RemoteOK
Apply NowSign in to track
AI-enhanced for better readability

Principal Software Engineer - Dynamo

Source: remoteok

NVIDIA Dynamo is an innovative, open-source platform focused on efficient, scalable inference for large language and reasoning models in distributed GPU environments.

About the Role

As a Principal Software Engineer on the Dynamo project, you will address some of the most sophisticated and high-impact challenges in distributed inference.

Responsibilities

You will be responsible for:

  • Collaborating on the design and development of the Dynamo Kubernetes stack.
  • Introducing new features to the Dynamo Python SDK and Dynamo Rust Runtime Core Library.
  • Designing, implementing, and optimizing distributed inference components in Rust and Python.
  • Contributing to the development of disaggregated serving for Dynamo-supported inference engines (vLLM, SGLang, TRT-LLM, llama.cpp, mistral.rs).
  • Improving intelligent routing and KV-cache management subsystems.
  • Contributing to open-source repositories, participating in code reviews, and assisting with issue triage on GitHub.
  • Working closely with the community to address issues, capture feedback, and evolve the framework’s APIs and architecture.
  • Writing clear documentation and contributing to user and developer guides.

Key Areas of Focus

  • Dynamo k8s Serving Platform: Build the Kubernetes deployment and workload management stack for Dynamo to facilitate inference deployments at scale. Identify bottlenecks and apply optimization techniques to fully use hardware capacity.
  • Scalability & Reliability: Develop robust, production-grade inference workload management systems that scale from a handful to thousands of GPUs, supporting a variety of LLM frameworks (e.g., TensorRT-LLM, vLLM, SGLang).
  • Disaggregated Serving: Architect and optimize the separation of prefill (context ingestion) and decode (token generation) phases across distinct GPU clusters to improve throughput and resource utilization. Contribute to embedding disaggregation for multi-modal models (Vision-Language models, Audio Language Models, Video Language Models).
  • Dynamic GPU Scheduling: Develop and refine Planner algorithms for real-time allocation and rebalancing of GPU resources based on fluctuating workloads and system bottlenecks, ensuring peak performance at scale.
  • Intelligent Routing: Enhance the smart routing system to efficiently direct inference requests to GPU worker replicas with relevant KV cache data, minimizing re-computation and latency for sophisticated, multi-step reasoning tasks.
  • Distributed KV Cache Management: Innovate in the management and transfer of large KV caches across heterogeneous memory and storage hierarchies, using the NVIDIA Optimized Transfer Library (NIXL) for low-latency, cost-effective data movement.

Requirements

  • BS/MS or higher in computer engineering, computer science, or related engineering (or equivalent experience).
  • 15+ years of proven experience in a related field.
  • Strong proficiency in systems programming (Rust and/or C++), with experience in Python for workflow and API development.
  • Experience with Go for Kubernetes controllers and operators development.
  • Deep understanding of distributed systems, parallel computing, and GPU architectures.
  • Experience with cloud-native deployment and container orchestration (Kubernetes, Docker).
  • Experience with large-scale inference serving, LLMs, or similar high-performance AI workloads.
  • Background with memory management, data transfer optimization, and multi-node orchestration.
  • Familiarity with open-source development workflows (GitHub, continuous integration and continuous deployment).
  • Excellent problem-solving and communication skills.

Ways to Stand Out

  • Prior contributions to open-source AI inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang).
  • Experience with GPU resource scheduling, cache management, or high-performance networking.
  • Understanding of LLM-specific inference challenges, such as context window scaling and multi-model agentic workflows.

Compensation

  • Base salary range: $272,000 USD - $431,250 USD
  • Eligibility for equity and benefits.

Additional Information

  • Applications accepted at least until January 13, 2026.
  • This is an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer.
  • Learn more about NVIDIA: NVIDIA

Similar jobs