A Survey on Interpretability in Visual Recognition

Qiyang Wan; Chengzhi Gao; Ruiping Wang; Xilin Chen

arXiv:2507.11099·cs.CV·March 12, 2026

A Survey on Interpretability in Visual Recognition

Qiyang Wan, Chengzhi Gao, Ruiping Wang, Xilin Chen

PDF

TL;DR

This survey reviews the development and evaluation of interpretability methods in visual recognition, emphasizing multimodal approaches and practical applications to guide future research in explainable AI.

Contribution

It introduces a multi-dimensional taxonomy for visual recognition interpretability, summarizes evaluation metrics, and explores emerging trends in multimodal large language models.

Findings

01

Comprehensive taxonomy for interpretability in visual recognition

02

Evaluation metrics and benchmarks for interpretability methods

03

Insights into multimodal large language models and applications

Abstract

Visual recognition models have achieved unprecedented success in various tasks. While researchers aim to understand the underlying mechanisms of these models, the growing demand for deployment in safety-critical areas like autonomous driving and medical diagnostics has accelerated the development of eXplainable AI (XAI). Distinct from generic XAI, visual recognition XAI is positioned at the intersection of vision and language, which represent the two most fundamental human modalities and form the cornerstones of multimodal intelligence. This paper provides a systematic survey of XAI in visual recognition by establishing a multi-dimensional taxonomy from a human-centered perspective based on intent, object, presentation, and methodology. Beyond categorization, we summarize critical evaluation desiderata and metrics, conducting an extensive qualitative assessment across different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training