Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text   Generation via Concentrating Attention

Wenhao Li; Xiaoyuan Yi; Jinyi Hu; Maosong Sun; Xing Xie

arXiv:2211.07164·cs.CL·November 15, 2022

Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention

Wenhao Li, Xiaoyuan Yi, Jinyi Hu, Maosong Sun, Xing Xie

PDF

Open Access 1 Repo

TL;DR

This paper identifies that sparser attention in Transformer models enhances diversity and novelty in text generation by addressing the representation degeneration caused by attention mixture, and proposes a regularization method to improve output quality.

Contribution

It introduces a novel attention regularization loss that promotes sparser attention, backed by theoretical analysis and easy implementation, to improve diversity and novelty in text generation.

Findings

01

Enhanced diversity and novelty in generated text.

02

Maintained comparable quality with baseline models.

03

Method is simple to implement and theoretically justified.

Abstract

Recently, powerful Transformer architectures have proven superior in generating high-quality sentences. Nevertheless, these models tend to produce dull high-frequency phrases, severely hurting the diversity and novelty of generated text. In this work, we dig into the intrinsic mechanism of this problem and found that sparser attention values in Transformer could improve diversity. To understand such a phenomenon, we first conduct both empirical and theoretical analysis and then attribute it to representation degeneration caused by the attentive mixture of the hidden states during training. We term this process the Trap of Mediocrity. To escape from such a trap, we introduce a novel attention regularization loss to control the sharpness of the attention distribution, which is transparent to model structures and can be easily implemented within 20 lines of python code. We prove that this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peterliwenhao/care
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Multi-Head Attention · Softmax · Adam · Absolute Position Encodings · Byte Pair Encoding