Generating Diverse Translation by Manipulating Multi-Head Attention

Zewei Sun; Shujian Huang; Hao-Ran Wei; Xin-yu Dai; Jiajun Chen

arXiv:1911.09333·cs.CL·November 22, 2019·1 cites

Generating Diverse Translation by Manipulating Multi-Head Attention

Zewei Sun, Shujian Huang, Hao-Ran Wei, Xin-yu Dai, Jiajun Chen

PDF

Open Access

TL;DR

This paper uncovers how multi-head attention in transformers aligns with different translation candidates and introduces a method to generate diverse translations, improving data augmentation and translation performance.

Contribution

It reveals the role of attention heads in translation diversity and proposes manipulating them to enhance translation and response generation tasks.

Findings

01

Diverse translations can be generated without quality loss.

02

Manipulating attention heads improves data augmentation.

03

Back-translation with diverse outputs boosts translation accuracy.

Abstract

Transformer model has been widely used on machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring significant improvement on performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications