Where is the Model Looking At?--Concentrate and Explain the Network Attention
Wenjia Xu, Jiuniu Wang, Yang Wang, Guangluan Xu, Wei Dai, Yirong Wu

TL;DR
This paper introduces an explainable multi-task framework that enhances model interpretability and attention focus on discriminative image regions, improving trust and performance across various models and datasets.
Contribution
The paper proposes the EAT framework that integrates attribute prediction with multi-task learning to produce interpretable, multi-modal explanations and improve model attention and accuracy.
Findings
EAT framework provides effective visual and textual explanations.
Guided attention improves recognition performance.
Framework generalizes across models and datasets.
Abstract
Image classification models have achieved satisfactory performance on many datasets, sometimes even better than human. However, The model attention is unclear since the lack of interpretability. This paper investigates the fidelity and interpretability of model attention. We propose an Explainable Attribute-based Multi-task (EAT) framework to concentrate the model attention on the discriminative image area and make the attention interpretable. We introduce attributes prediction to the multi-task learning network, helping the network to concentrate attention on the foreground objects. We generate attribute-based textual explanations for the network and ground the attributes on the image to show visual explanations. The multi-model explanation can not only improve user trust but also help to find the weakness of network and dataset. Our framework can be generalized to any basic model. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
