AI-enhanced for better readability
Principal Software Engineer - Dynamo
Source: remoteok
NVIDIA Dynamo is an innovative, open-source platform focused on efficient, scalable inference for large language and reasoning models in distributed GPU environments.
About the Role
As a Principal Software Engineer on the Dynamo project, you will address some of the most sophisticated and high-impact challenges in distributed inference.
Responsibilities
You will be responsible for:
- Collaborating on the design and development of the Dynamo Kubernetes stack.
- Introducing new features to the Dynamo Python SDK and Dynamo Rust Runtime Core Library.
- Designing, implementing, and optimizing distributed inference components in Rust and Python.
- Contributing to the development of disaggregated serving for Dynamo-supported inference engines (vLLM, SGLang, TRT-LLM, llama.cpp, mistral.rs).
- Improving intelligent routing and KV-cache management subsystems.
- Contributing to open-source repositories, participating in code reviews, and assisting with issue triage on GitHub.
- Working closely with the community to address issues, capture feedback, and evolve the framework’s APIs and architecture.
- Writing clear documentation and contributing to user and developer guides.
Key Areas of Focus
- Dynamo k8s Serving Platform: Build the Kubernetes deployment and workload management stack for Dynamo to facilitate inference deployments at scale. Identify bottlenecks and apply optimization techniques to fully use hardware capacity.
- Scalability & Reliability: Develop robust, production-grade inference workload management systems that scale from a handful to thousands of GPUs, supporting a variety of LLM frameworks (e.g., TensorRT-LLM, vLLM, SGLang).
- Disaggregated Serving: Architect and optimize the separation of prefill (context ingestion) and decode (token generation) phases across distinct GPU clusters to improve throughput and resource utilization. Contribute to embedding disaggregation for multi-modal models (Vision-Language models, Audio Language Models, Video Language Models).
- Dynamic GPU Scheduling: Develop and refine Planner algorithms for real-time allocation and rebalancing of GPU resources based on fluctuating workloads and system bottlenecks, ensuring peak performance at scale.
- Intelligent Routing: Enhance the smart routing system to efficiently direct inference requests to GPU worker replicas with relevant KV cache data, minimizing re-computation and latency for sophisticated, multi-step reasoning tasks.
- Distributed KV Cache Management: Innovate in the management and transfer of large KV caches across heterogeneous memory and storage hierarchies, using the NVIDIA Optimized Transfer Library (NIXL) for low-latency, cost-effective data movement.
Requirements
- BS/MS or higher in computer engineering, computer science, or related engineering (or equivalent experience).
- 15+ years of proven experience in a related field.
- Strong proficiency in systems programming (Rust and/or C++), with experience in Python for workflow and API development.
- Experience with Go for Kubernetes controllers and operators development.
- Deep understanding of distributed systems, parallel computing, and GPU architectures.
- Experience with cloud-native deployment and container orchestration (Kubernetes, Docker).
- Experience with large-scale inference serving, LLMs, or similar high-performance AI workloads.
- Background with memory management, data transfer optimization, and multi-node orchestration.
- Familiarity with open-source development workflows (GitHub, continuous integration and continuous deployment).
- Excellent problem-solving and communication skills.
Ways to Stand Out
- Prior contributions to open-source AI inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang).
- Experience with GPU resource scheduling, cache management, or high-performance networking.
- Understanding of LLM-specific inference challenges, such as context window scaling and multi-model agentic workflows.
Compensation
- Base salary range: $272,000 USD - $431,250 USD
- Eligibility for equity and benefits.
Additional Information
- Applications accepted at least until January 13, 2026.
- This is an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer.
- Learn more about NVIDIA: NVIDIA