TL;DR
Locally typical sampling is a new method for probabilistic language generation that improves coherence and reduces repetitions by selecting words with information content close to the model's expected entropy.
Contribution
This paper introduces locally typical sampling, a novel and efficient approach to enhance the quality of language generation by aligning word choices with the model's entropy-based criteria.
Findings
Reduces repetitive and dull outputs compared to nucleus and top-k sampling.
Maintains competitive quality in summarization and story generation tasks.
Improves coherence and diversity in generated text.
Abstract
Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language generation as a discrete stochastic process--which allows for an information-theoretic analysis--can provide new insights into the behavior of probabilistic language generators, e.g., why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, aiming to do so in a simultaneously efficient and error-minimizing manner; in fact, psycholinguistics research suggests humans choose each word in a string with this subconscious goal in mind. We formally define the set of strings that meet this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Author Interview - Typical Decoding for Natural Language Generation· youtube
Typical Decoding for Natural Language Generation (Get more human-like outputs from language models!)· youtube
