Retrieval-Augmented Open-Vocabulary Object Detection

Jooyeon Kim; Eulrang Cho; Sehyung Kim; Hyunwoo J. Kim

arXiv:2404.05687·cs.CV·April 9, 2024·1 cites

Retrieval-Augmented Open-Vocabulary Object Detection

Jooyeon Kim, Eulrang Cho, Sehyung Kim, Hyunwoo J. Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces RALF, a retrieval-augmented approach for open-vocabulary object detection that enhances generalization by incorporating related negative classes and verbalized concepts, leading to improved detection of novel objects.

Contribution

The paper proposes Retrieval-Augmented Losses and visual Features (RALF), a novel method that retrieves negative classes and uses verbalized concepts to improve open-vocabulary object detection.

Findings

01

Achieves up to 3.4 AP improvement on COCO novel categories.

02

Improves 3.6 mask AP on LVIS dataset.

03

Demonstrates effectiveness of retrieval-augmented features in open-vocabulary detection.

Abstract

Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, using 'positive' pseudo-labels with additional 'class' names, e.g., sock, iPod, and alligator. To extend the previous methods in two aspects, we propose Retrieval-Augmented Losses and visual Features (RALF). Our method retrieves related 'negative' classes and augments loss functions. Also, visual features are augmented with 'verbalized concepts' of classes, e.g., worn on the feet, handheld music player, and sharp teeth. Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF). RAL constitutes two losses reflecting the semantic similarity with negative vocabularies. In addition,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlvlab/RALF
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques