Cross-Model Consensus of Explanations and Beyond for Image Classification Models: An Empirical Study
Xuhong Li, Haoyi Xiong, Siyu Huang, Shilei Ji, Dejing Dou

TL;DR
This empirical study investigates common features used by various image classification models through cross-model explanation consensus, revealing correlations with model performance and interpretability.
Contribution
The paper introduces a novel cross-model consensus method for explanations, highlighting shared features among models and their relation to performance and interpretability.
Findings
Consensus explanations align with semantic segmentation ground truth.
Higher consensus scores correlate with better model performance.
Consensus scores are indicative of model interpretability.
Abstract
Existing interpretation algorithms have found that, even deep models make the same and right predictions on the same image, they might rely on different sets of input features for classification. However, among these sets of features, some common features might be used by the majority of models. In this paper, we are wondering what are the common features used by various models for classification and whether the models with better performance may favor those common features. For this purpose, our works uses an interpretation algorithm to attribute the importance of features (e.g., pixels or superpixels) as explanations, and proposes the cross-model consensus of explanations to capture the common features. Specifically, we first prepare a set of deep models as a committee, then deduce the explanation for every model, and obtain the consensus of explanations across the entire committee…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
