Detecting the open-world objects with the help of the Brain
Shuailei Ma, Yuefeng Wang, Ying Wei, Peihao Chen, Zhixiang Ye, Jiaqi, Fan, Enming Zhang, Thomas H. Li

TL;DR
This paper introduces a novel open-world object detection method that uses large pre-trained vision-language models as a 'brain' to identify unknown objects, employing a special loss function and pseudo-labeling to improve detection and learning.
Contribution
It proposes leveraging large pre-trained vision-language models as a 'brain' for OWOD, with new loss functions and a pseudo-labeling scheme for better unknown object detection.
Findings
Effective detection of unknown objects in open-world scenarios.
Improved incremental learning of novel objects.
Utilization of VL models enhances open-world detection capabilities.
Abstract
Open World Object Detection (OWOD) is a novel computer vision task with a considerable challenge, bridging the gap between classic object detection (OD) benchmarks and real-world object detection. In addition to detecting and classifying seen/known objects, OWOD algorithms are expected to detect unseen/unknown objects and incrementally learn them. The natural instinct of humans to identify unknown objects in their environments mainly depends on their brains' knowledge base. It is difficult for a model to do this only by learning from the annotation of several tiny datasets. The large pre-trained grounded language-image models - VL (\ie GLIP) have rich knowledge about the open world but are limited to the text prompt. We propose leveraging the VL as the ``Brain'' of the open-world detector by simply generating unknown labels. Leveraging it is non-trivial because the unknown labels impair…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
