Analyzing the Source and Target Contributions to Predictions in Neural   Machine Translation

Elena Voita; Rico Sennrich; Ivan Titov

arXiv:2010.10907·cs.CL·June 28, 2021

Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation

Elena Voita, Rico Sennrich, Ivan Titov

PDF

1 Repo 1 Video

TL;DR

This paper introduces a method based on Layerwise Relevance Propagation to explicitly measure and analyze the relative influence of source and target context in neural machine translation, providing insights into model behavior.

Contribution

It extends LRP to Transformer models and evaluates the source and target contributions during translation, a novel approach in NMT interpretability.

Findings

01

Models trained with more data rely more on source information.

02

Contribution distributions become sharper with more data.

03

Training exhibits non-monotonic stages of influence shifts.

Abstract

In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argue that this relative contribution can be evaluated by adopting a variant of Layerwise Relevance Propagation (LRP). Its underlying 'conservation principle' makes relevance propagation unique: differently from other methods, it evaluates not an abstract quantity reflecting token importance, but the proportion of each token's influence. We extend LRP to the Transformer and conduct an analysis of NMT models which explicitly evaluates the source and target relative contributions to the generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lena-voita/the-story-of-heads
tfOfficial

Videos

039 - Lena Voita - NLP· youtube

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax · Adam · Layer Normalization · Dense Connections · Multi-Head Attention · Label Smoothing