AI Fundamentals
← All Concepts
advanced

Top-K Sampling

The Shortlist Round

6 min read

The Analogy

The Shortlist Round

A hiring manager receives 500 CVs but only interviews the top 10.

Before making a decision, they cut the field to the 10 best candidates. Randomness still exists — they might pick candidate 7 over candidate 3 based on subtle factors — but the truly unsuitable 490 are eliminated first. Top-K works the same for AI word selection: only the top K most probable next tokens are eligible, then the model picks one.

In Plain English

Top-K sampling limits the AI's word choices to the K most likely next words at each step. It prevents the AI from picking very unlikely or bizarre words while still allowing some creative variation.


The Technical Picture

Top-K sampling filters the probability distribution to keep only the K highest-probability tokens before renormalisation and sampling. This eliminates the long tail of low-probability tokens (reducing incoherence) while preserving diversity among the most plausible continuations.

Real-World Examples

  • Most production LLM APIs allow setting top_k as a generation parameter
  • K=40 is a common default — the top 40 words compete at each step
  • K=1 is greedy decoding — always picks the single most likely word
Key Takeaway

Top-K creates a shortlist of likely words before picking one — reducing nonsense while keeping variety.

Related Concepts