Learning to Break the Loop: Analyzing and Mitigating Repetitions for   Neural Text Generation

Jin Xu; Xiaojiang Liu; Jianhao Yan; Deng Cai; Huayang Li; Jian Li

arXiv:2206.02369·cs.CL·October 11, 2022·21 cites

Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation

Jin Xu, Xiaojiang Liu, Jianhao Yan, Deng Cai, Huayang Li, Jian Li

PDF

Open Access 2 Repos 1 Models

TL;DR

This paper investigates why neural language models tend to produce repetitive sentences, revealing a self-reinforcement effect, and proposes DITTO, a training method that penalizes repetitive probabilities to improve generation quality.

Contribution

The paper uncovers the self-reinforcement mechanism behind sentence repetitions and introduces DITTO, a novel training approach that effectively mitigates repetitions without harming model performance.

Findings

01

Models prefer to repeat previous sentences.

02

Repetitions reinforce themselves, increasing likelihood.

03

DITTO reduces repetitions and improves text quality.

Abstract

While large-scale neural language models, such as GPT2 and BART, have achieved impressive results on various text generation tasks, they tend to get stuck in undesirable sentence-level loops with maximization-based decoding algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive since there are few consecutive sentence-level repetitions in human corpora (e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for generating consecutive sentence-level repetitions, we study the relationship between the probabilities of the repetitive tokens and their previous repetitions in the context. Through our quantitative experiments, we find that 1) Language models have a preference to repeat the previous sentence; 2) The sentence-level repetitions have a \textit{self-reinforcement effect}: the more times a sentence is repeated in the context, the higher the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
dinalt/walsh_instruct-1-7b
model· 6 dl· ♡ 1
6 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Dropout · Dense Connections