From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Zizhao Li; Zhengkang Xiang; Joseph West; Kourosh Khoshelham

arXiv:2411.18207·cs.CV·February 27, 2026

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Zizhao Li, Zhengkang Xiang, Joseph West, Kourosh Khoshelham

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework for open world object detection that enables models to identify and learn new objects incrementally, overcoming limitations of traditional and open vocabulary detection methods, especially in autonomous driving scenarios.

Contribution

It proposes Open World Embedding Learning (OWEL) and Multi-Scale Contrastive Anchor Learning (MSCAL) to detect and learn unknown objects, improving open world detection performance.

Findings

01

Achieves state-of-the-art results on open world detection benchmarks.

02

Effectively detects far-out-of-distribution objects in autonomous driving.

03

Maintains open vocabulary detection capabilities while identifying unknown objects.

Abstract

Traditional object detection methods operate under the closed-set assumption, where models can only detect a fixed number of objects predefined in the training set. Recent works on open vocabulary object detection (OVD) enable the detection of objects defined by an in-principle unbounded vocabulary, which reduces the cost of training models for specific tasks. However, OVD heavily relies on accurate prompts provided by an ``oracle'', which limits their use in critical applications such as driving scene perception. OVD models tend to misclassify near-out-of-distribution (NOOD) objects that have similar features to known classes, and ignore far-out-of-distribution (FOOD) objects. To address these limitations, we propose a framework that enables OVD models to operate in open world settings, by identifying and incrementally learning previously unseen objects. To detect FOOD objects, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

343gltysprk/ovow
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMuseums and Cultural Heritage