Learning to Detect Every Thing in an Open World
Kuniaki Saito, Ping Hu, Trevor Darrell, Kate Saenko

TL;DR
This paper introduces LDET, a training scheme that improves open-world object detection and segmentation by augmenting data with pasted objects and decoupling training phases, leading to better generalization to unseen objects.
Contribution
The paper proposes a novel data augmentation and training approach called LDET that enhances detection of unlabeled objects in open-world scenarios.
Findings
Significant improvements on open-world instance segmentation datasets.
Outperforms baselines on cross-category generalization on COCO.
Achieves better cross-dataset performance on UVO and Cityscapes.
Abstract
Many open-world applications require the detection of novel objects, yet state-of-the-art object detection and instance segmentation networks do not excel at this task. The key issue lies in their assumption that regions without any annotations should be suppressed as negatives, which teaches the model to treat the unannotated objects as background. To address this issue, we propose a simple yet surprisingly powerful data augmentation and training scheme we call Learning to Detect Every Thing (LDET). To avoid suppressing hidden objects, background objects that are visible but unlabeled, we paste annotated objects on a background image sampled from a small region of the original image. Since training solely on such synthetically-augmented images suffers from domain shift, we decouple the training into two parts: 1) training the region classification and regression head on augmented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Remote-Sensing Image Classification · Video Surveillance and Tracking Methods
