Is Attention Interpretation? A Quantitative Assessment On Sets
Jonathan Haab, Nicolas Deutschmann, Maria Rodr\'iguez Mart\'inez

TL;DR
This paper systematically evaluates the interpretability of attention mechanisms in set-based machine learning, revealing that attention often correlates with importance but can also mislead, and proposes ensembling to improve explanation reliability.
Contribution
It introduces a quantitative framework for assessing attention interpretability in set models and demonstrates how ensembling can mitigate misleading explanations.
Findings
Attention often reflects relative importance of instances.
High classification performance does not guarantee correct attention explanations.
Ensembling reduces the risk of misleading attention-based interpretations.
Abstract
The debate around the interpretability of attention mechanisms is centered on whether attention scores can be used as a proxy for the relative amounts of signal carried by sub-components of data. We propose to study the interpretability of attention in the context of set machine learning, where each data point is composed of an unordered collection of instances with a global label. For classical multiple-instance-learning problems and simple extensions, there is a well-defined "importance" ground truth that can be leveraged to cast interpretation as a binary classification problem, which we can quantitatively evaluate. By building synthetic datasets over several data modalities, we perform a systematic assessment of attention-based interpretations. We find that attention distributions are indeed often reflective of the relative importance of individual instances, but that silent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsALIGN
