Pay Better Attention to Attention: Head Selection in Multilingual and   Multi-Domain Sequence Modeling

Hongyu Gong; Yun Tang; Juan Pino; Xian Li

arXiv:2106.10840·cs.CL·June 22, 2021·5 cites

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

Hongyu Gong, Yun Tang, Juan Pino, Xian Li

PDF

Open Access 1 Video

TL;DR

This paper introduces attention sharing strategies that automatically learn shared and specialized heads in multi-head attention, improving multilingual and multi-domain sequence modeling by reducing interference and enhancing performance.

Contribution

It proposes novel attention sharing strategies that adaptively learn shared and domain-specific attention heads for better generalization across languages and domains.

Findings

01

Achieves +2.0 BLEU in multilingual speech translation

02

Consistently improves sequence model performance across tasks

03

Mitigates negative transfer in multi-domain learning

Abstract

Multi-head attention has each of the attention heads collect salient information from different parts of an input sequence, making it a powerful mechanism for sequence modeling. Multilingual and multi-domain learning are common scenarios for sequence modeling, where the key challenge is to maximize positive transfer and mitigate negative transfer across languages and domains. In this paper, we find that non-selective attention sharing is sub-optimal for achieving good generalization across all languages and domains. We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling. Our approach automatically learns shared and specialized attention heads for different languages and domains to mitigate their interference. Evaluated in various tasks including speech recognition, text-to-text and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning