Know When To Stop: A Study of Semantic Drift in Text Generation
Ava Spataru, Eric Hambro, Elena Voita, Nicola Cancedda

TL;DR
This paper investigates how large language models tend to generate accurate facts initially but then drift into inaccuracies, proposing methods like early stopping and reranking to improve factual correctness in long-form text generation.
Contribution
The study introduces a semantic drift score, analyzes the correct-then-incorrect pattern, and develops techniques to enhance factual accuracy through early stopping and reranking.
Findings
Semantic drift score effectively measures factual divergence.
Early stopping improves factual accuracy significantly.
Reranking with semantic similarity further enhances reliability.
Abstract
In this work, we explicitly show that modern LLMs tend to generate correct facts first, then "drift away" and generate incorrect facts later: this was occasionally observed but never properly measured. We develop a semantic drift score that measures the degree of separation between correct and incorrect facts in generated texts and confirm our hypothesis when generating Wikipedia-style biographies. This correct-then-incorrect generation pattern suggests that factual accuracy can be improved by knowing when to stop generation. Therefore, we explore the trade-off between information quantity and factual accuracy for several early stopping methods and manage to improve factuality by a large margin. We further show that reranking with semantic similarity can further improve these results, both compared to the baseline and when combined with early stopping. Finally, we try calling external…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsEarly Stopping
