Vocabulary-free Image Classification

Alessandro Conti; Enrico Fini; Massimiliano Mancini; Paolo Rota,; Yiming Wang; Elisa Ricci

arXiv:2306.00917·cs.CV·January 15, 2024·5 cites

Vocabulary-free Image Classification

Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota,, Yiming Wang, Elisa Ricci

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces Vocabulary-free Image Classification (VIC), a new task that classifies images into an open-ended semantic space without predefined categories, using external databases and vision-language models.

Contribution

It formalizes VIC, proposes CaSED, a training-free method leveraging external databases and vision-language models, and demonstrates its effectiveness over existing frameworks.

Findings

01

CaSED outperforms other vision-language frameworks on benchmarks.

02

Using external databases effectively captures relevant semantic content.

03

The approach is efficient with fewer parameters.

Abstract

Recent advances in large vision-language models have revolutionized the image classification paradigm. Despite showing impressive zero-shot capabilities, a pre-defined set of categories, a.k.a. the vocabulary, is assumed at test time for composing the textual prompts. However, such assumption can be impractical when the semantic context is unknown and evolving. We thus formalize a novel task, termed as Vocabulary-free Image Classification (VIC), where we aim to assign to an input image a class that resides in an unconstrained language-induced semantic space, without the prerequisite of a known vocabulary. VIC is a challenging task as the semantic space is extremely large, containing millions of concepts, with hard-to-discriminate fine-grained categories. In this work, we first empirically verify that representing this semantic space by means of an external vision-language database is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

altndrr/vic
pytorchOfficial

Models

🤗
altndrr/cased
model· 66 dl· ♡ 1
66 dl♡ 1

Videos

Vocabulary-free Image Classification· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques