Foundations of Top-$k$ Decoding For Language Models

Georgy Noarov; Soham Mallick; Tao Wang; Sunay Joshi; Yan Sun; Yangxinyu Xie; Mengxin Yu; Edgar Dobriban

arXiv:2505.19371·cs.AI·February 24, 2026

Foundations of Top-$k$ Decoding For Language Models

Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban

PDF

Open Access

TL;DR

This paper provides a theoretical foundation for top-$k$ decoding in language models, framing it as sparse probability recovery via Bregman divergence minimization, and introduces new decoding strategies beyond traditional top-$k$ methods.

Contribution

It develops a unified theoretical framework for top-$k$ decoding using Bregman divergence minimization, generalizes the method, and proposes novel decoding strategies.

Findings

01

Top-$k$ decoding is a special case of Bregman divergence-based recovery.

02

Optimal decoding strategies are greedy and can be efficiently found via binary search.

03

New decoding methods can non-linearly re-weight probabilities, offering alternative sampling behaviors.

Abstract

Top- $k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top- $k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top- $k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top- $k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $ℓ_{0}$ regularization. Despite the combinatorial nature of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques