Vocabulary-free Image Classification and Semantic Segmentation

Alessandro Conti; Enrico Fini; Massimiliano Mancini; Paolo Rota,; Yiming Wang; Elisa Ricci

arXiv:2404.10864·cs.CV·April 18, 2024·1 cites

Vocabulary-free Image Classification and Semantic Segmentation

Alessandro Conti, Enrico Fini, Massimiliano Mancini, Paolo Rota,, Yiming Wang, Elisa Ricci

PDF

Open Access 1 Repo

TL;DR

This paper introduces Vocabulary-free Image Classification (VIC) and Semantic Segmentation, enabling image understanding without predefined categories by leveraging external databases and pre-trained vision-language models, thus addressing the limitations of fixed vocabularies.

Contribution

The paper proposes CaSED, a training-free method for vocabulary-free classification and segmentation using external databases and vision-language models, advancing flexible image understanding.

Findings

01

CaSED outperforms complex models on classification benchmarks.

02

CaSED effectively generates coarse segmentation masks.

03

The approach requires fewer parameters than existing models.

Abstract

Large vision-language models revolutionized image classification and semantic segmentation paradigms. However, they typically assume a pre-defined set of categories, or vocabulary, at test time for composing textual prompts. This assumption is impractical in scenarios with unknown or evolving semantic context. Here, we address this issue and introduce the Vocabulary-free Image Classification (VIC) task, which aims to assign a class from an unconstrained language-induced semantic space to an input image without needing a known vocabulary. VIC is challenging due to the vastness of the semantic space, which contains millions of concepts, including fine-grained categories. To address VIC, we propose Category Search from External Databases (CaSED), a training-free method that leverages a pre-trained vision-language model and an external database. CaSED first extracts the set of candidate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

altndrr/vicss
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsSparse Evolutionary Training