Attention Interpretability Across NLP Tasks
Shikhar Vashishth, Shyam Upadhyay, Gaurav Singh Tomar, Manaal Faruqui

TL;DR
This paper investigates the interpretability of attention mechanisms in neural NLP models, providing a systematic explanation of when attention weights are meaningful and validating these insights through diverse experiments and manual evaluation.
Contribution
It offers a comprehensive framework to understand attention interpretability, reconciling conflicting viewpoints and systematically analyzing attention across multiple NLP tasks.
Findings
Attention can be interpretable in certain contexts
Interpretability depends on specific conditions and tasks
Manual evaluation supports the interpretability claims
Abstract
The attention layer in a neural network model provides insights into the model's reasoning behind its prediction, which are usually criticized for being opaque. Recently, seemingly contradictory viewpoints have emerged about the interpretability of attention weights (Jain & Wallace, 2019; Vig & Belinkov, 2019). Amid such confusion arises the need to understand attention mechanism more systematically. In this work, we attempt to fill this gap by giving a comprehensive explanation which justifies both kinds of observations (i.e., when is attention interpretable and when it is not). Through a series of experiments on diverse NLP tasks, we validate our observations and reinforce our claim of interpretability of attention through manual evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Topic Modeling
MethodsInterpretability
