OW-CLIP: Data-Efficient Visual Supervision for Open-World Object Detection via Human-AI Collaboration

Junwen Duan; Wei Xue; Ziyao Kang; Shixia Liu; Jiazhi Xia

arXiv:2507.19870·cs.CV·July 29, 2025

OW-CLIP: Data-Efficient Visual Supervision for Open-World Object Detection via Human-AI Collaboration

Junwen Duan, Wei Xue, Ziyao Kang, Shixia Liu, Jiazhi Xia

PDF

TL;DR

OW-CLIP introduces a data-efficient, human-AI collaborative system for open-world object detection, reducing data needs and improving adaptability through novel prompt tuning, data refinement, and visualization tools.

Contribution

The paper presents OW-CLIP, a modular system combining prompt tuning, data refinement, and visualization to enable efficient open-world object detection with minimal data.

Findings

01

Achieves 89% of SOTA performance with only 3.8% self-generated data.

02

Outperforms SOTA when trained with equivalent data volumes.

03

Effective visualization improves annotation quality.

Abstract

Open-world object detection (OWOD) extends traditional object detection to identifying both known and unknown object, necessitating continuous model adaptation as new annotations emerge. Current approaches face significant limitations: 1) data-hungry training due to reliance on a large number of crowdsourced annotations, 2) susceptibility to "partial feature overfitting," and 3) limited flexibility due to required model architecture modifications. To tackle these issues, we present OW-CLIP, a visual analytics system that provides curated data and enables data-efficient OWOD model incremental training. OW-CLIP implements plug-and-play multimodal prompt tuning tailored for OWOD settings and introduces a novel "Crop-Smoothing" technique to mitigate partial feature overfitting. To meet the data requirements for the training methodology, we propose dual-modal data refinement methods that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.