Representing visual classification as a linear combination of words
Shobhit Agarwal, Yevgeniy R. Semenov, William Lotter

TL;DR
This paper introduces a method that uses vision-language models to represent visual classification tasks as a linear combination of words, providing interpretable, language-based explanations that align with domain knowledge.
Contribution
The authors propose a novel approach leveraging pre-trained joint image-text embeddings to generate language-based descriptors for visual classification, aiding interpretability and auditing.
Findings
Descriptors align with clinical knowledge despite limited domain training
Method reveals potential shortcut connections in datasets
Language explanations enable non-experts to perform medical tasks
Abstract
Explainability is a longstanding challenge in deep learning, especially in high-stakes domains like healthcare. Common explainability methods highlight image regions that drive an AI model's decision. Humans, however, heavily rely on language to convey explanations of not only "where" but "what". Additionally, most explainability approaches focus on explaining individual AI predictions, rather than describing the features used by an AI model in general. The latter would be especially useful for model and dataset auditing, and potentially even knowledge generation as AI is increasingly being used in novel tasks. Here, we present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task. By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Biomedical Text Mining and Ontologies
MethodsALIGN · Focus
