Integrating attention into explanation frameworks for language and vision transformers
Marte Eggen, Jacob Lysn{\ae}s-Larsen, Inga Str\"umke

TL;DR
This paper explores how attention weights in transformers can be integrated into explanation methods for language and vision models, enhancing interpretability by providing more meaningful local and global explanations.
Contribution
It introduces two novel explanation techniques that incorporate attention weights into existing XAI frameworks for improved interpretability of transformers.
Findings
Attention weights can be effectively integrated into explanation methods.
The proposed methods outperform some existing explainability techniques.
Attention-based explanations provide meaningful insights into model behavior.
Abstract
The attention mechanism lies at the core of the transformer architecture, providing an interpretable model-internal signal that has motivated a growing interest in attention-based model explanations. Although attention weights do not directly determine model outputs, they reflect patterns of token influence that can inform and complement established explainability techniques. This work studies the potential of utilising the information encoded in attention weights to provide meaningful model explanations by integrating them into explainable AI (XAI) frameworks that target fundamentally different aspects of model behaviour. To this end, we develop two novel explanation methods applicable to both natural language processing and computer vision tasks. The first integrates attention weights into the Shapley value decomposition by redefining the characteristic function in terms of pairwise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Advanced Neural Network Applications
