Open Vocabulary Object Detection with Pseudo Bounding-Box Labels
Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, Ran Xu, Wenhao, Liu, Caiming Xiong

TL;DR
This paper introduces a method to automatically generate pseudo bounding-box labels from image-caption pairs using pre-trained vision-language models, significantly improving open vocabulary object detection performance across multiple datasets.
Contribution
The authors propose a novel approach to enlarge training data for open vocabulary detection by automatically creating pseudo bounding boxes, enhancing generalization to novel object categories.
Findings
Outperforms state-of-the-art open vocabulary detector by 8% AP on COCO novel categories.
Achieves 6.3% AP improvement on PASCAL VOC.
Demonstrates effectiveness across multiple datasets like Objects365 and LVIS.
Abstract
Despite great progress in object detection, most existing methods work only on a limited set of object categories, due to the tremendous human effort needed for bounding-box annotations of training data. To alleviate the problem, recent open vocabulary and zero-shot detection methods attempt to detect novel object categories beyond those seen during training. They achieve this goal by training on a pre-defined base categories to induce generalization to novel objects. However, their potential is still constrained by the small set of base categories available for training. To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs. Our method leverages the localization ability of pre-trained vision-language models to generate pseudo bounding-box labels and then directly uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
