Open World Object Detection in the Era of Foundation Models
Orr Zohar, Alejandro Lozano, Shelly Goel, Serena Yeung, Kuan-Chieh, Wang

TL;DR
This paper explores the use of foundation models for open world object detection, introduces a new challenging benchmark with real-world datasets, and proposes FOMO, a method that significantly improves unknown object detection performance.
Contribution
It presents a new benchmark for evaluating foundation model-based open world object detection and introduces FOMO, a novel method leveraging class attribute sharing for better unknown object detection.
Findings
FOMO achieves ~3x higher unknown object mAP than baselines.
Existing benchmarks are insufficient for evaluating foundation model integration.
A new benchmark with diverse real-world datasets was curated.
Abstract
Object detection is integral to a bevy of real-world applications, from robotics to medical image analysis. To be used reliably in such applications, models must be capable of handling unexpected - or novel - objects. The open world object detection (OWD) paradigm addresses this challenge by enabling models to detect unknown objects and learn discovered ones incrementally. However, OWD method development is hindered due to the stringent benchmark and task definitions. These definitions effectively prohibit foundation models. Here, we aim to relax these definitions and investigate the utilization of pre-trained foundation models in OWD. First, we show that existing benchmarks are insufficient in evaluating methods that utilize foundation models, as even naive integration methods nearly saturate these benchmarks. This result motivated us to curate a new and challenging benchmark for these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsBalanced Selection
