YOLO-UniOW: Efficient Universal Open-World Object Detection

Lihao Liu; Juexiao Feng; Hui Chen; Ao Wang; Lin Song; Jungong Han,; Guiguang Ding

arXiv:2412.20645·cs.CV·December 31, 2024

YOLO-UniOW: Efficient Universal Open-World Object Detection

Lihao Liu, Juexiao Feng, Hui Chen, Ao Wang, Lin Song, Jungong Han,, Guiguang Ding

PDF

Open Access 1 Repo

TL;DR

YOLO-UniOW is a new efficient model for open-world object detection that can recognize known categories and detect unknown objects, adapting dynamically without retraining.

Contribution

It introduces Adaptive Decision Learning and Wildcard Learning strategies, enabling efficient, versatile, and open-vocabulary object detection in real-time environments.

Findings

01

Achieves 34.6 AP on LVIS with 69.6 FPS

02

Sets new benchmarks on multiple open-world datasets

03

Effectively detects unknown objects and adapts to new categories

Abstract

Traditional object detection models are constrained by the limitations of closed-set datasets, detecting only categories encountered during training. While multimodal models have extended category recognition by aligning text and image modalities, they introduce significant inference overhead due to cross-modality fusion and still remain restricted by predefined vocabulary, leaving them ineffective at handling unknown objects in open-world scenarios. In this work, we introduce Universal Open-World Object Detection (Uni-OWD), a new paradigm that unifies open-vocabulary and open-world object detection tasks. To address the challenges of this setting, we propose YOLO-UniOW, a novel model that advances the boundaries of efficiency, versatility, and performance. YOLO-UniOW incorporates Adaptive Decision Learning to replace computationally expensive cross-modality fusion with lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-mig/yolo-uniow
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Video Surveillance and Tracking Methods

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Contrastive Language-Image Pre-training