Generating Diverse Translation by Manipulating Multi-Head Attention
Zewei Sun, Shujian Huang, Hao-Ran Wei, Xin-yu Dai, Jiajun Chen

TL;DR
This paper uncovers how multi-head attention in transformers aligns with different translation candidates and introduces a method to generate diverse translations, improving data augmentation and translation performance.
Contribution
It reveals the role of attention heads in translation diversity and proposes manipulating them to enhance translation and response generation tasks.
Findings
Diverse translations can be generated without quality loss.
Manipulating attention heads improves data augmentation.
Back-translation with diverse outputs boosts translation accuracy.
Abstract
Transformer model has been widely used on machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring significant improvement on performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
