DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object   Detection

Shilin Xu; Xiangtai Li; Size Wu; Wenwei Zhang; Yunhai Tong; Chen; Change Loy

arXiv:2310.01393·cs.CV·April 2, 2024·6 cites

DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection

Shilin Xu, Xiangtai Li, Size Wu, Wenwei Zhang, Yunhai Tong, Chen, Change Loy

PDF

Open Access 1 Repo

TL;DR

DST-Det introduces a simple, efficient self-training approach leveraging pre-trained vision-language models to improve open-vocabulary object detection, enhancing recall and accuracy for novel classes without extra annotations or re-training.

Contribution

The paper proposes a novel self-training strategy that selects proposals as background or novel classes, improving open-vocabulary detection without additional data or re-training.

Findings

01

Significant performance improvements on LVIS, V3Det, and COCO datasets.

02

Achieved 1.7% better AP on LVIS compared to F-VLM.

03

Reaches 46.7 novel class AP on COCO without extra data.

Abstract

Open-vocabulary object detection (OVOD) aims to detect the objects beyond the set of classes observed during training. This work introduces a straightforward and efficient strategy that utilizes pre-trained vision-language models (VLM), like CLIP, to identify potential novel classes through zero-shot classification. Previous methods use a class-agnostic region proposal network to detect object proposals and consider the proposals that do not match the ground truth as background. Unlike these methods, our method will select a subset of proposals that will be considered as background during the training. Then, we treat them as novel classes during training. We refer to this approach as the self-training strategy, which enhances recall and accuracy for novel classes without requiring extra annotations, datasets, and re-training. Compared to previous pseudo methods, our approach does not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xushilin1/dst-det
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsContrastive Language-Image Pre-training