Truncation Sampling as Language Model Desmoothing
John Hewitt, Christopher D. Manning, Percy Liang

TL;DR
This paper introduces a new truncation sampling method called η-sampling that improves the quality and plausibility of long text samples from neural language models by better estimating the true distribution.
Contribution
The work provides a new theoretical framing of truncation as desmoothing and proposes η-sampling, an improved algorithm that adaptively truncates words based on entropy, outperforming previous methods.
Findings
η-sampling produces more plausible long English texts
It better avoids repetitive outputs
It performs well across various test distributions
Abstract
Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms--like top- or top- -- address this by setting some words' probabilities to zero at each step. This work provides framing for the aim of truncation, and an improved algorithm for that aim. We propose thinking of a neural language model as a mixture of a true distribution and a smoothing distribution that avoids infinite perplexity. In this light, truncation algorithms aim to perform desmoothing, estimating a subset of the support of the true distribution. Finding a good subset is crucial: we show that top- unnecessarily truncates high-probability words, for example causing it to truncate all words but Trump for a document that starts with Donald. We introduce -sampling, which truncates words below an entropy-dependent probability threshold. Compared to previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parametersmodel· ♡ 174♡ 174
- 🤗Dca3271144691983/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parametersmodel
- 🤗lucky087/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parametersmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational Physics and Python Applications
MethodsTest
