Coherence boosting: When your pretrained language model is not paying enough attention
Nikolay Malkin, Zhen Wang, Nebojsa Jojic

TL;DR
This paper introduces coherence boosting, an inference method that enhances large language models' focus on long contexts, improving their performance in language generation and understanding without additional training.
Contribution
The paper proposes a novel inference procedure called coherence boosting that increases LMs' attention to distant words, improving long-range coherence in generated text.
Findings
Coherence boosting improves text coherence in generated language.
It yields performance gains in zero-shot NLP tasks.
No additional training is required for the method.
Abstract
Long-range semantic coherence remains a challenge in automatic language generation and understanding. We demonstrate that large language models have insufficiently learned the effect of distant words on next-token prediction. We present coherence boosting, an inference procedure that increases a LM's focus on a long context. We show the benefits of coherence boosting with pretrained models by distributional analyses of generated ordinary text and dialog responses. It is also found that coherence boosting with state-of-the-art models for various zero-shot NLP tasks yields performance gains with no additional training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
