Interpretability-Aware Vision Transformer
Yao Qiang, Chengyin Li, Prashant Khanduri, Dongxiao Zhu

TL;DR
This paper introduces IA-ViT, a novel training method for Vision Transformers that inherently improves interpretability by aligning attention maps with model predictions, outperforming post hoc explanation methods.
Contribution
We propose a new training procedure for Vision Transformers that enhances interpretability without relying on post hoc explanation methods, using a joint training framework with an interpretability-aware objective.
Findings
IA-ViT achieves better interpretability in image classification tasks.
The model provides faithful explanations through its attention mechanism.
Experimental results show improved interpretability and comparable performance.
Abstract
Vision Transformers (ViTs) have become prominent models for solving various vision tasks. However, the interpretability of ViTs has not kept pace with their promising performance. While there has been a surge of interest in developing {\it post hoc} solutions to explain ViTs' outputs, these methods do not generalize to different downstream tasks and various transformer architectures. Furthermore, if ViTs are not properly trained with the given data and do not prioritize the region of interest, the {\it post hoc} methods would be less effective. Instead of developing another {\it post hoc} approach, we introduce a novel training procedure that inherently enhances model interpretability. Our interpretability-aware ViT (IA-ViT) draws inspiration from a fresh insight: both the class patch and image patches consistently generate predicted distributions and attention maps. IA-ViT is composed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
