What's in a Name? Beyond Class Indices for Image Recognition
Kai Han, Xiaohu Huang, Yandong Li, Sagar Vaze, Jie Li and, Xuhui Jia

TL;DR
This paper introduces a novel approach to image recognition that uses vision-language models and non-parametric clustering to assign meaningful class names from large, unconstrained vocabularies, significantly improving unsupervised performance.
Contribution
It proposes a new method combining clustering and textual retrieval with vision-language models to recognize images with large, unconstrained class vocabularies, surpassing baseline results.
Findings
Approximately 50% improvement over baseline on ImageNet in unsupervised setting
Effective use of external textual data enhances clustering accuracy
Method works in both unsupervised and semi-supervised scenarios
Abstract
Existing machine learning models demonstrate excellent performance in image object recognition after training on a large-scale dataset under full supervision. However, these models only learn to map an image to a predefined class index, without revealing the actual semantic meaning of the object in the image. In contrast, vision-language models like CLIP are able to assign semantic class names to unseen objects in a 'zero-shot' manner, though they are once again provided a pre-defined set of candidate names at test-time. In this paper, we reconsider the recognition problem and task a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information. We leverage non-parametric methods to establish meaningful relationships between images, allowing the model to automatically narrow down the pool of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
MethodsTest · Contrastive Language-Image Pre-training
