Top-K Sampling
The Shortlist Round
6 min read
The Shortlist Round
A hiring manager receives 500 CVs but only interviews the top 10.
Before making a decision, they cut the field to the 10 best candidates. Randomness still exists — they might pick candidate 7 over candidate 3 based on subtle factors — but the truly unsuitable 490 are eliminated first. Top-K works the same for AI word selection: only the top K most probable next tokens are eligible, then the model picks one.
In Plain English
Top-K sampling limits the AI's word choices to the K most likely next words at each step. It prevents the AI from picking very unlikely or bizarre words while still allowing some creative variation.
The Technical Picture
Top-K sampling filters the probability distribution to keep only the K highest-probability tokens before renormalisation and sampling. This eliminates the long tail of low-probability tokens (reducing incoherence) while preserving diversity among the most plausible continuations.
Real-World Examples
- Most production LLM APIs allow setting top_k as a generation parameter
- K=40 is a common default — the top 40 words compete at each step
- K=1 is greedy decoding — always picks the single most likely word
Top-K creates a shortlist of likely words before picking one — reducing nonsense while keeping variety.