Explaining CLIP Zero-shot Predictions Through Concepts

Onat Ozdemir; Anders Christensen; Stephan Alaniz; Zeynep Akata; Emre Akbas

arXiv:2603.28211·cs.CV·March 31, 2026

Explaining CLIP Zero-shot Predictions Through Concepts

Onat Ozdemir, Anders Christensen, Stephan Alaniz, Zeynep Akata, Emre Akbas

PDF

1 Repo

TL;DR

EZPC explains CLIP's zero-shot image recognition by projecting its embeddings into a human-understandable concept space, maintaining accuracy and enhancing interpretability without extra supervision.

Contribution

Introduces EZPC, a method that bridges CLIP's predictions with interpretable concepts using a learned projection, without requiring additional concept labels.

Findings

01

Maintains CLIP's zero-shot accuracy on benchmark datasets.

02

Provides meaningful concept-level explanations for predictions.

03

Grounds open-vocabulary predictions in explicit semantic concepts.

Abstract

Large-scale vision-language models such as CLIP have achieved remarkable success in zero-shot image recognition, yet their predictions remain largely opaque to human understanding. In contrast, Concept Bottleneck Models provide interpretable intermediate representations by reasoning through human-defined concepts, but they rely on concept supervision and lack the ability to generalize to unseen classes. We introduce EZPC that bridges these two paradigms by explaining CLIP's zero-shot predictions through human-understandable concepts. Our method projects CLIP's joint image-text embeddings into a concept space learned from language descriptions, enabling faithful and transparent explanations without additional supervision. The model learns this projection via a combination of alignment and reconstruction objectives, ensuring that concept activations preserve CLIP's semantic structure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oonat/ezpc
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.