Seeing in Words: Learning to Classify through Language Bottlenecks

Khalid Saifullah; Yuxin Wen; Jonas Geiping; Micah Goldblum; Tom; Goldstein

arXiv:2307.00028·cs.CV·July 4, 2023·1 cites

Seeing in Words: Learning to Classify through Language Bottlenecks

Khalid Saifullah, Yuxin Wen, Jonas Geiping, Micah Goldblum, Tom, Goldstein

PDF

Open Access

TL;DR

This paper introduces a vision model that learns to classify images using text-based features, aiming to improve interpretability while maintaining high accuracy on ImageNet.

Contribution

It presents a novel approach where neural networks learn to classify images through language bottlenecks, bridging the gap between interpretability and performance.

Findings

01

Model effectively classifies ImageNet images using text features

02

Training such models presents unique challenges

03

The approach enhances interpretability of neural network predictions

Abstract

Neural networks for computer vision extract uninterpretable features despite achieving high accuracy on benchmarks. In contrast, humans can explain their predictions using succinct and intuitive descriptions. To incorporate explainability into neural networks, we train a vision model whose feature representations are text. We show that such a model can effectively classify ImageNet images, and we discuss the challenges we encountered when training it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling