Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

Shufan Shen; Zhaobo Qi; Junshu Sun; Qingming Huang; Qi Tian; Shuhui Wang

arXiv:2510.24105·cs.CV·October 29, 2025

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

Shufan Shen, Zhaobo Qi, Junshu Sun, Qingming Huang, Qi Tian, Shuhui Wang

PDF

3 Reviews

TL;DR

This paper introduces a new metric called Inherent Interpretability Score (IIS) to evaluate how well pre-trained visual representations balance interpretability and classifiability, revealing a positive correlation between the two.

Contribution

It proposes IIS to quantify representation interpretability and demonstrates that enhancing classifiability also improves interpretability in pre-trained vision models.

Findings

01

Higher classifiability correlates with increased interpretability.

02

Fine-tuning with interpretability maximization boosts classifiability.

03

Interpretability-based predictions cause less accuracy degradation.

Abstract

The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However, it remains unclear whether the pre-trained representations can achieve high interpretability and classifiability simultaneously. To answer this question, we quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations. Given the pre-trained representations, only the interpretable semantics can be captured by interpretations, whereas the uninterpretable part leads to information loss. Based on this fact, we propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 8Confidence 4

Strengths

1. The authors proposed a novel evaluation metric to quantitatively evaluate the pre-trained representation's interpretability. The proposed IIS can not only be used in the research area related to post-hoc explainable approaches but also improve model classification performance and interoperability. 2. The paper is written and well-organized. It is easy to follow the authors' ideas and understand their approaches. The authors clearly show their motivation and ideas in Figure 1 and Figure 2. Th

Weaknesses

1. Besides extensive experimental results, the authors should have provided clear theoretical analyses of the proposed method. 2. The proposed method IIS relies on pre-defined or generated concepts, which are usually hard to get and not applicable in real-world scenarios.

Reviewer 02Rating 8Confidence 5

Strengths

* This paper utilizes concept ablation method for evaluation the interpretability of the representations, which has also been adopted for evaluating the post-hoc interpretation methods [1]. * The authors utilize both visual and textual concepts for IIS. * The paper is well-written and easy to follow. * The authors provide code for reproducibility check.

Weaknesses

1. **Completeness of Evaluation:** Huang et al. [2] have discussed the trustworthiness in concept bottleneck models (CBMs), noting that several concepts can be activated in irrelevant regions with regards to the input image, leading to a loss of model interpretability. However, the authors have overlooked this issue. 2. **Method:** In the IIS evaluation, in what order are concepts removed? How would the evaluation results differ if removing began with the most important concepts for the category

Reviewer 03Rating 8Confidence 4

Strengths

1. This paper quantifies the interpretability of representations and discovers an interesting and significant positive correlation between the interpretability and classifiability of representations. The results have been demonstrated on multiple datasets and architectures. 2. The observation is used in designing IIS maximization loss, which results in improved classifiability. 3. The paper is well-written, making it easy to follow the main ideas and claims.

Weaknesses

1. The proposed IIS score is similar to the “average accuracy obtained at different number of effective concepts” proposed in [1]. Particularly, [1] trains the final linear classifier with elastic-net regularization in GLM-SAGA and tunes regularization strength for controlling sparsity (number of effective concepts). The average accuracy is obtained by averaging accuracy at different sparsity levels. The similarities between the metric proposed in [1] and IIS should be discussed. 2. Limited data

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.