Job Title: GPU Engineer at Kog

Location: Paris, France (REMOTE within a Europe-compatible timezone) Onsite Requirement: One week per month onsite in Paris Source: Hacker News

About the Role

We are hiring a GPU Engineer to work on the fastest LLM inference engine on standard datacenter GPUs.

We have rewritten the entire hot path, from the assembly on the chip up to the Transformer we designed around it, with the full decode running as a single persistent GPU kernel. At batch size 1, the decode is GEMV (memory bandwidth bound), making MBU the critical metric.

Performance Benchmarks:

8x AMD MI300X: 3,000 tokens/s per request
8x NVIDIA H200: 2,100 tokens/s per request
Conditions: Batch size 1, FP16, no speculative decoding.

Responsibilities

You will own:

Low-level kernel work in CUDA/PTX or HIP/CDNA ISA.
The monokernel pipeline and its internal profiling infrastructure.
Scaling to frontier MoE models that run in production.
Building autonomous agents that optimize kernels and inference.

Requirements

Proficiency in low-level GPU programming (CUDA/PTX or HIP/CDNA).
Ability to demonstrate your code (showing your code is part of the hiring process).
If you are currently outside a Europe-compatible timezone, relocation to one is required.

How to Apply

Apply here: Kog Job Portal
Try our tech: Kog Playground
Questions: Email nicolas.constant@kog.ai

Job Title: GPU Engineer at Kog

About the Role

Responsibilities

Requirements

How to Apply

Similar jobs