Detecting the open-world objects with the help of the Brain

Shuailei Ma; Yuefeng Wang; Ying Wei; Peihao Chen; Zhixiang Ye; Jiaqi; Fan; Enming Zhang; Thomas H. Li

arXiv:2303.11623·cs.CV·March 22, 2023·1 cites

Detecting the open-world objects with the help of the Brain

Shuailei Ma, Yuefeng Wang, Ying Wei, Peihao Chen, Zhixiang Ye, Jiaqi, Fan, Enming Zhang, Thomas H. Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel open-world object detection method that uses large pre-trained vision-language models as a 'brain' to identify unknown objects, employing a special loss function and pseudo-labeling to improve detection and learning.

Contribution

It proposes leveraging large pre-trained vision-language models as a 'brain' for OWOD, with new loss functions and a pseudo-labeling scheme for better unknown object detection.

Findings

01

Effective detection of unknown objects in open-world scenarios.

02

Improved incremental learning of novel objects.

03

Utilization of VL models enhances open-world detection capabilities.

Abstract

Open World Object Detection (OWOD) is a novel computer vision task with a considerable challenge, bridging the gap between classic object detection (OD) benchmarks and real-world object detection. In addition to detecting and classifying seen/known objects, OWOD algorithms are expected to detect unseen/unknown objects and incrementally learn them. The natural instinct of humans to identify unknown objects in their environments mainly depends on their brains' knowledge base. It is difficult for a model to do this only by learning from the annotation of several tiny datasets. The large pre-trained grounded language-image models - VL (\ie GLIP) have rich knowledge about the open world but are limited to the text prompt. We propose leveraging the VL as the ``Brain'' of the open-world detector by simply generating unknown labels. Leveraging it is non-trivial because the unknown labels impair…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaomabufei/dowb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications