What's in a Name? Beyond Class Indices for Image Recognition

Kai Han; Xiaohu Huang; Yandong Li; Sagar Vaze; Jie Li and; Xuhui Jia

arXiv:2304.02364·cs.CV·July 30, 2024·1 cites

What's in a Name? Beyond Class Indices for Image Recognition

Kai Han, Xiaohu Huang, Yandong Li, Sagar Vaze, Jie Li and, Xuhui Jia

PDF

Open Access

TL;DR

This paper introduces a novel approach to image recognition that uses vision-language models and non-parametric clustering to assign meaningful class names from large, unconstrained vocabularies, significantly improving unsupervised performance.

Contribution

It proposes a new method combining clustering and textual retrieval with vision-language models to recognize images with large, unconstrained class vocabularies, surpassing baseline results.

Findings

01

Approximately 50% improvement over baseline on ImageNet in unsupervised setting

02

Effective use of external textual data enhances clustering accuracy

03

Method works in both unsupervised and semi-supervised scenarios

Abstract

Existing machine learning models demonstrate excellent performance in image object recognition after training on a large-scale dataset under full supervision. However, these models only learn to map an image to a predefined class index, without revealing the actual semantic meaning of the object in the image. In contrast, vision-language models like CLIP are able to assign semantic class names to unseen objects in a 'zero-shot' manner, though they are once again provided a pre-defined set of candidate names at test-time. In this paper, we reconsider the recognition problem and task a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information. We leverage non-parametric methods to establish meaningful relationships between images, allowing the model to automatically narrow down the pool of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsTest · Contrastive Language-Image Pre-training