Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class
Mazda Moayeri, Michael Rabbat, Mark Ibrahim, Diane Bouchacourt

TL;DR
This paper introduces a novel zero-shot classification method that models intra-class diversity using inferred attributes, improving accuracy and interpretability without retraining, especially for diverse and atypical object instances.
Contribution
It proposes a new approach to encode class diversity with inferred attributes, moving beyond single-vector representations in zero-shot classification, enhancing performance and transparency.
Findings
Outperforms standard zero-shot classifiers across multiple datasets.
Scales efficiently to many attributes, improving accuracy for atypical instances.
Provides interpretable explanations for each inference.
Abstract
Vision-language models enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today's best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms -- from diced to whole, on a table or in a bowl -- yet standard VLM classifiers map all instances of a class to a \it{single vector based on the class label}. We argue that to represent this rich diversity within a class, zero-shot classification should move beyond a single vector. We propose a method to encode and account for diversity within a class using inferred attributes, still in the zero-shot setting without retraining. We find our method consistently outperforms standard zero-shot classification over a large suite of datasets encompassing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
