Local and Global Decoding in Text Generation
Daniel Gareev, Thomas Hofmann, Ezhilmathi Krishnasamy, Tiago Pimentel

TL;DR
This paper examines the impact of local versus global normalisation in text decoding algorithms, revealing that local methods often outperform globally-normalised ones despite distribution distortion.
Contribution
It introduces globally-normalised decoding methods and an MCMC approach, providing empirical insights into their performance compared to traditional local normalisation.
Findings
Global decoding often underperforms local decoding in practice.
Distortion from local normalisation can be beneficial for performance.
Global normalisation preserves distribution integrity but may reduce effectiveness.
Abstract
Text generation, a key component in applications such as dialogue systems, relies on decoding algorithms that sample strings from a language model distribution. Traditional methods, such as top- and top-, apply local normalisation to the model's output distribution, which can distort it. In this paper, we investigate the effect of this distortion by introducing globally-normalised versions of these decoding methods. Additionally, we propose an independent Metropolis-Hastings algorithm to approximate sampling from globally-normalised distributions without explicitly computing them. Our empirical analysis compares the performance of local and global normalisation across two decoding algorithms (top- and top-) with various hyperparameters, using Pythia language models. Results show that, in most configurations, global decoding performs worse than the local decoding version…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsPythia
