Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object   Detection Considering Text Describability

Yusuke Hosoya; Masanori Suganuma; Takayuki Okatani

arXiv:2410.15315·cs.CV·October 22, 2024

Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability

Yusuke Hosoya, Masanori Suganuma, Takayuki Okatani

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the effectiveness of open-vocabulary versus closed-set object detection in few-shot scenarios, introducing a measure of text-describability to guide dataset categorization and method selection.

Contribution

It proposes a novel way to quantify dataset text-describability and empirically compares OVD and COD methods across different dataset categories.

Findings

01

OVD and COD perform similarly on low text-describability classes.

02

Increasing training data volume with OVD can be counterproductive for low-describability classes.

03

The proposed measure helps guide practitioners in choosing appropriate detection methods.

Abstract

Open-vocabulary object detection (OVD), detecting specific classes of objects using only their linguistic descriptions (e.g., class names) without any image samples, has garnered significant attention. However, in real-world applications, the target class concepts is often hard to describe in text and the only way to specify target objects is to provide their image examples, yet it is often challenging to obtain a good number of samples. Thus, there is a high demand from practitioners for few-shot object detection (FSOD). A natural question arises: Can the benefits of OVD extend to FSOD for object classes that are difficult to describe in text? Compared to traditional methods that learn only predefined classes (referred to in this paper as closed-set object detection, COD), can the extra cost of OVD be justified? To answer these questions, we propose a method to quantify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rsCPSyEu/ovd_cod
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Video Analysis and Summarization

MethodsContrastive Language-Image Pre-training