Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention
Wenhao Li, Xiaoyuan Yi, Jinyi Hu, Maosong Sun, Xing Xie

TL;DR
This paper identifies that sparser attention in Transformer models enhances diversity and novelty in text generation by addressing the representation degeneration caused by attention mixture, and proposes a regularization method to improve output quality.
Contribution
It introduces a novel attention regularization loss that promotes sparser attention, backed by theoretical analysis and easy implementation, to improve diversity and novelty in text generation.
Findings
Enhanced diversity and novelty in generated text.
Maintained comparable quality with baseline models.
Method is simple to implement and theoretically justified.
Abstract
Recently, powerful Transformer architectures have proven superior in generating high-quality sentences. Nevertheless, these models tend to produce dull high-frequency phrases, severely hurting the diversity and novelty of generated text. In this work, we dig into the intrinsic mechanism of this problem and found that sparser attention values in Transformer could improve diversity. To understand such a phenomenon, we first conduct both empirical and theoretical analysis and then attribute it to representation degeneration caused by the attentive mixture of the hidden states during training. We term this process the Trap of Mediocrity. To escape from such a trap, we introduce a novel attention regularization loss to control the sharpness of the attention distribution, which is transparent to model structures and can be easily implemented within 20 lines of python code. We prove that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Multi-Head Attention · Softmax · Adam · Absolute Position Encodings · Byte Pair Encoding
