NoOVD: Novel Category Discovery and Embedding for Open-Vocabulary Object Detection
Yupeng Zhang, Ruize Han, Zhiwei Chen, Wei Feng, Liang Wan

TL;DR
This paper introduces NoOVD, a training framework that enhances open-vocabulary object detection by leveraging vision-language models for better novel object discovery and embedding, significantly improving recall and detection accuracy.
Contribution
The paper proposes a novel training method using self-distillation with frozen vision-language models and introduces R-RPN to improve proposal confidence scoring for open-vocabulary detection.
Findings
Consistently outperforms existing methods across multiple datasets.
Improves recall of novel-category objects during detection.
Enhances overall detection performance in open-vocabulary scenarios.
Abstract
Despite the remarkable progress in open-vocabulary object detection (OVD), a significant gap remains between the training and testing phases. During training, the RPN and RoI heads often misclassify unlabeled novel-category objects as background, causing some proposals to be prematurely filtered out by the RPN while others are further misclassified by the RoI head. During testing, these proposals again receive low scores and are removed in post-processing, leading to a significant drop in recall and ultimately weakening novel-category detection performance.To address these issues, we propose a novel training framework-NoOVD-which innovatively integrates a self-distillation mechanism grounded in the knowledge of frozen vision-language models (VLMs). Specifically, we design K-FPN, which leverages the pretrained knowledge of VLMs to guide the model in discovering novel-category objects and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
