From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li, Zhengkang Xiang, Joseph West, Kourosh Khoshelham

TL;DR
This paper introduces a novel framework for open world object detection that enables models to identify and learn new objects incrementally, overcoming limitations of traditional and open vocabulary detection methods, especially in autonomous driving scenarios.
Contribution
It proposes Open World Embedding Learning (OWEL) and Multi-Scale Contrastive Anchor Learning (MSCAL) to detect and learn unknown objects, improving open world detection performance.
Findings
Achieves state-of-the-art results on open world detection benchmarks.
Effectively detects far-out-of-distribution objects in autonomous driving.
Maintains open vocabulary detection capabilities while identifying unknown objects.
Abstract
Traditional object detection methods operate under the closed-set assumption, where models can only detect a fixed number of objects predefined in the training set. Recent works on open vocabulary object detection (OVD) enable the detection of objects defined by an in-principle unbounded vocabulary, which reduces the cost of training models for specific tasks. However, OVD heavily relies on accurate prompts provided by an ``oracle'', which limits their use in critical applications such as driving scene perception. OVD models tend to misclassify near-out-of-distribution (NOOD) objects that have similar features to known classes, and ignore far-out-of-distribution (FOOD) objects. To address these limitations, we propose a framework that enables OVD models to operate in open world settings, by identifying and incrementally learning previously unseen objects. To detect FOOD objects, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMuseums and Cultural Heritage
