Representing visual classification as a linear combination of words

Shobhit Agarwal; Yevgeniy R. Semenov; William Lotter

arXiv:2311.10933·cs.AI·November 21, 2023·1 cites

Representing visual classification as a linear combination of words

Shobhit Agarwal, Yevgeniy R. Semenov, William Lotter

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method that uses vision-language models to represent visual classification tasks as a linear combination of words, providing interpretable, language-based explanations that align with domain knowledge.

Contribution

The authors propose a novel approach leveraging pre-trained joint image-text embeddings to generate language-based descriptors for visual classification, aiding interpretability and auditing.

Findings

01

Descriptors align with clinical knowledge despite limited domain training

02

Method reveals potential shortcut connections in datasets

03

Language explanations enable non-experts to perform medical tasks

Abstract

Explainability is a longstanding challenge in deep learning, especially in high-stakes domains like healthcare. Common explainability methods highlight image regions that drive an AI model's decision. Humans, however, heavily rely on language to convey explanations of not only "where" but "what". Additionally, most explainability approaches focus on explaining individual AI predictions, rather than describing the features used by an AI model in general. The latter would be especially useful for model and dataset auditing, and potentially even knowledge generation as AI is increasingly being used in novel tasks. Here, we present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task. By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lotterlab/task_word_explainability
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Biomedical Text Mining and Ontologies

MethodsALIGN · Focus