Interrogating the Explanatory Power of Attention in Neural Machine Translation
Pooya Moradi, Nishant Kambhatla, and Anoop Sarkar

TL;DR
This paper critically evaluates whether attention mechanisms in neural machine translation truly explain model decisions, finding that they cannot reliably do so as counterfactual attention models still produce similar translations.
Contribution
The study introduces counterfactual attention models to test the explanatory power of attention in NMT, revealing its limitations.
Findings
Counterfactual attention models preserve 68% of function words and 21% of content words.
Attention models alone are insufficient to reliably explain NMT decisions.
Experiments demonstrate the limited explanatory power of attention in NMT.
Abstract
Attention models have become a crucial component in neural machine translation (NMT). They are often implicitly or explicitly used to justify the model's decision in generating a specific token but it has not yet been rigorously established to what extent attention is a reliable source of information in NMT. To evaluate the explanatory power of attention for NMT, we examine the possibility of yielding the same prediction but with counterfactual attention models that modify crucial aspects of the trained attention model. Using these counterfactual attention mechanisms we assess the extent to which they still preserve the generation of function and content words in the translation process. Compared to a state of the art attention model, our counterfactual attention models produce 68% of function words and 21% of content words in our German-English dataset. Our experiments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
