Staying True to Your Word: (How) Can Attention Become Explanation?
Martin Tutek, Jan \v{S}najder

TL;DR
This paper investigates the interpretability of attention mechanisms in NLP, especially in recurrent networks, proposing a new objective to improve their faithfulness as explanations.
Contribution
It identifies limitations of attention as explanation in recurrent models and introduces a word-level objective to enhance interpretability.
Findings
Attention can be made more faithful with the proposed objective.
Recurrent models' attention explanations are more credible after the proposed method.
Provides theoretical and empirical support for attention as explanation in certain contexts.
Abstract
The attention mechanism has quickly become ubiquitous in NLP. In addition to improving performance of models, attention has been widely used as a glimpse into the inner workings of NLP models. The latter aspect has in the recent years become a common topic of discussion, most notably in work of Jain and Wallace, 2019; Wiegreffe and Pinter, 2019. With the shortcomings of using attention weights as a tool of transparency revealed, the attention mechanism has been stuck in a limbo without concrete proof when and whether it can be used as an explanation. In this paper, we provide an explanation as to why attention has seen rightful critique when used with recurrent networks in sequence classification tasks. We propose a remedy to these issues in the form of a word level objective and our findings give credibility for attention to provide faithful interpretations of recurrent models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
