Back to all jobs

GPU Engineer at Kog

HackerNews
Apply NowSign in to track
AI-enhanced for better readability

Job Title: GPU Engineer at Kog

Location: Paris, France (REMOTE within a Europe-compatible timezone) Onsite Requirement: One week per month onsite in Paris Source: Hacker News


About the Role

We are hiring a GPU Engineer to work on the fastest LLM inference engine on standard datacenter GPUs.

We have rewritten the entire hot path, from the assembly on the chip up to the Transformer we designed around it, with the full decode running as a single persistent GPU kernel. At batch size 1, the decode is GEMV (memory bandwidth bound), making MBU the critical metric.

Performance Benchmarks:

  • 8x AMD MI300X: 3,000 tokens/s per request
  • 8x NVIDIA H200: 2,100 tokens/s per request
  • Conditions: Batch size 1, FP16, no speculative decoding.

Responsibilities

You will own:

  • Low-level kernel work in CUDA/PTX or HIP/CDNA ISA.
  • The monokernel pipeline and its internal profiling infrastructure.
  • Scaling to frontier MoE models that run in production.
  • Building autonomous agents that optimize kernels and inference.

Requirements

  • Proficiency in low-level GPU programming (CUDA/PTX or HIP/CDNA).
  • Ability to demonstrate your code (showing your code is part of the hiring process).
  • If you are currently outside a Europe-compatible timezone, relocation to one is required.

How to Apply

Similar jobs