A Survey on Interpretability in Visual Recognition
Qiyang Wan, Chengzhi Gao, Ruiping Wang, Xilin Chen

TL;DR
This survey reviews the development and evaluation of interpretability methods in visual recognition, emphasizing multimodal approaches and practical applications to guide future research in explainable AI.
Contribution
It introduces a multi-dimensional taxonomy for visual recognition interpretability, summarizes evaluation metrics, and explores emerging trends in multimodal large language models.
Findings
Comprehensive taxonomy for interpretability in visual recognition
Evaluation metrics and benchmarks for interpretability methods
Insights into multimodal large language models and applications
Abstract
Visual recognition models have achieved unprecedented success in various tasks. While researchers aim to understand the underlying mechanisms of these models, the growing demand for deployment in safety-critical areas like autonomous driving and medical diagnostics has accelerated the development of eXplainable AI (XAI). Distinct from generic XAI, visual recognition XAI is positioned at the intersection of vision and language, which represent the two most fundamental human modalities and form the cornerstones of multimodal intelligence. This paper provides a systematic survey of XAI in visual recognition by establishing a multi-dimensional taxonomy from a human-centered perspective based on intent, object, presentation, and methodology. Beyond categorization, we summarize critical evaluation desiderata and metrics, conducting an extensive qualitative assessment across different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
