PRIME: Prioritizing Interpretability in Failure Mode Extraction
Keivan Rezaei, Mehrdad Saberi, Mazda Moayeri, Soheil Feizi

TL;DR
This paper introduces PRIME, a method that enhances interpretability in failure mode extraction for image classifiers by using human-understandable tags and minimal descriptions, leading to more accurate failure explanations.
Contribution
PRIME proposes a novel approach that prioritizes interpretability by leveraging human-understandable concepts and minimal tag sets to better identify and describe failure modes.
Findings
Successfully identifies failure modes across datasets
Generates high-quality, interpretable failure descriptions
Outperforms existing clustering-based methods
Abstract
In this work, we study the challenge of providing human-understandable descriptions for failure modes in trained image classification models. Existing works address this problem by first identifying clusters (or directions) of incorrectly classified samples in a latent space and then aiming to provide human-understandable text descriptions for them. We observe that in some cases, describing text does not match well with identified failure modes, partially owing to the fact that shared interpretable attributes of failure modes may not be captured using clustering in the feature space. To improve on these shortcomings, we propose a novel approach that prioritizes interpretability in this problem: we start by obtaining human-understandable concepts (tags) of images in the dataset and then analyze the model's behavior based on the presence or absence of combinations of these tags. Our…
Peer Reviews
Decision·ICLR 2024 poster
1. Experiments done on multiple datasets to support the approach. 2. Visual examples of the method working look promising. The proposed method maybe a good practical method to generate descriptions of failure modes in a general purpose entity recognition models.
1. The method relies heavily on an auxiliary model (like Recognize Anything Model). Although this makes an important tool for describing failure modes, everything that the proposed method can do is bottlenecked by this auxiliary model's capability. For example, the authors present this method as a tool to describe "failure modes in trained image classification models". However, what happens when the image classification model is trained on a domain-specific dataset like chest X-Rays? How can we
1. The authors contextualize related work on error discovery well in the Introduction, and clearly motivate gaps in the existing literature (i.e., that generating text descriptions of groups where an image classifier underperforms is difficult). This problem is timely and significant. 2. The authors' instructive figures (e.g., Figures 1, 3, and 5) clearly illustrate the benefits of their proposed approach. 3. The authors' proposed PRIME method is straightforward, and explained in a way that is c
My primary critique of this work is that I believe the authors should directly address potential limitations of their proposed method in their paper. In the present draft, many important limitations are excluded completely. I am willing to adjust my score if my below concerns are addressed. * **Weakness #1: Understanding limitations of relying on a pre-trained tagging model (Step 1)**. PRIME relies on a pre-trained tagging model (in this case, RAM) to provide a set of tags for each image. How
1. The paper addresses the important problem of understanding failure modes in deep networks. Given the black-box nature of such models, this is crucial building trust and deploying such models safely. 1. It recognizes and addresses a drawback in prior clustering based approaches, that similarity in representation space need not imply similarity in semantic space. This corroborates findings reported in other contexts in prior work (e.g. [1]). It proposes a simple alternative, i.e. tagging all im
1. The method seems highly reliant on being able to find all relevant tags first. Failure modes caused by concepts not in the tag set $T_c$ would remain undetected. Given that low frequency tags are filtered out, this could miss potentially important but less frequently occurring failure modes, such as spurious correlations. Broadly, there seems to be a tradeoff involved in choosing the size of the tag set -- larger the set, more the failure modes that can be caught, but also more computationall
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
