Do Multilingual Neural Machine Translation Models Contain Language Pair   Specific Attention Heads?

Zae Myung Kim; Laurent Besacier; Vassilina Nikoulina; Didier Schwab

arXiv:2105.14940·cs.CL·June 1, 2021

Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?

Zae Myung Kim, Laurent Besacier, Vassilina Nikoulina, Didier Schwab

PDF

Open Access

TL;DR

This study investigates whether specific attention heads in multilingual NMT models are dedicated to particular language pairs, revealing that most important heads are shared across languages and some can be removed without quality loss.

Contribution

It introduces a systematic analysis of attention heads in multilingual NMT, showing that language-specific heads are not prominent and that many heads can be pruned with minimal impact.

Findings

01

Most important attention heads are similar across language pairs.

02

Approximately one-third of less important heads can be removed without significant quality loss.

03

Attention head importance correlates weakly with language pair specificity.

Abstract

Recent studies on the analysis of the multilingual representations focus on identifying whether there is an emergence of language-independent representations, or whether a multilingual model partitions its weights among different languages. While most of such work has been conducted in a "black-box" manner, this paper aims to analyze individual components of a multilingual neural translation (NMT) model. In particular, we look at the encoder self-attention and encoder-decoder attention heads (in a many-to-one NMT model) that are more specific to the translation of a certain language pair than others by (1) employing metrics that quantify some aspects of the attention weights such as "variance" or "confidence", and (2) systematically ranking the importance of attention heads with respect to translation quality. Experimental results show that surprisingly, the set of most important…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)