AI-enhanced for better readability
Job Title: GPU Engineer at Kog
Location: Paris, France (REMOTE within a Europe-compatible timezone) Onsite Requirement: One week per month onsite in Paris Source: Hacker News
About the Role
We are hiring a GPU Engineer to work on the fastest LLM inference engine on standard datacenter GPUs.
We have rewritten the entire hot path, from the assembly on the chip up to the Transformer we designed around it, with the full decode running as a single persistent GPU kernel. At batch size 1, the decode is GEMV (memory bandwidth bound), making MBU the critical metric.
Performance Benchmarks:
- 8x AMD MI300X: 3,000 tokens/s per request
- 8x NVIDIA H200: 2,100 tokens/s per request
- Conditions: Batch size 1, FP16, no speculative decoding.
Responsibilities
You will own:
- Low-level kernel work in CUDA/PTX or HIP/CDNA ISA.
- The monokernel pipeline and its internal profiling infrastructure.
- Scaling to frontier MoE models that run in production.
- Building autonomous agents that optimize kernels and inference.
Requirements
- Proficiency in low-level GPU programming (CUDA/PTX or HIP/CDNA).
- Ability to demonstrate your code (showing your code is part of the hiring process).
- If you are currently outside a Europe-compatible timezone, relocation to one is required.
How to Apply
- Apply here: Kog Job Portal
- Try our tech: Kog Playground
- Questions: Email nicolas.constant@kog.ai