Embracing Diversity: Interpretable Zero-shot classification beyond one   vector per class

Mazda Moayeri; Michael Rabbat; Mark Ibrahim; Diane Bouchacourt

arXiv:2404.16717·cs.CV·April 26, 2024

Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class

Mazda Moayeri, Michael Rabbat, Mark Ibrahim, Diane Bouchacourt

PDF

TL;DR

This paper introduces a novel zero-shot classification method that models intra-class diversity using inferred attributes, improving accuracy and interpretability without retraining, especially for diverse and atypical object instances.

Contribution

It proposes a new approach to encode class diversity with inferred attributes, moving beyond single-vector representations in zero-shot classification, enhancing performance and transparency.

Findings

01

Outperforms standard zero-shot classifiers across multiple datasets.

02

Scales efficiently to many attributes, improving accuracy for atypical instances.

03

Provides interpretable explanations for each inference.

Abstract

Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms -- from diced to whole, on a table or in a bowl -- yet standard VLM classifiers map all instances of a class to a \it{single vector based on the class label}. We argue that to represent this rich diversity within a class, zero-shot classification should move beyond a single vector. We propose a method to encode and account for diversity within a class using inferred attributes, still in the zero-shot setting without retraining. We find our method consistently outperforms standard zero-shot classification over a large suite of datasets encompassing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.