VL-SAM-V2: Open-World Object Detection with General and Specific Query Fusion
Zhiwei Lin, Yongtao Wang

TL;DR
VL-SAM-V2 is an innovative open-world object detection framework that fuses open-set and open-ended queries, enabling discovery of unseen objects with improved performance, especially on rare categories.
Contribution
The paper introduces a novel query fusion module and ranked learnable queries for open-world detection, enhancing the ability to discover unseen objects without human input.
Findings
Outperforms previous open-set and open-ended methods on LVIS.
Excels particularly on rare object categories.
Demonstrates flexible evaluation in open-set and open-ended modes.
Abstract
Current perception models have achieved remarkable success by leveraging large-scale labeled datasets, but still face challenges in open-world environments with novel objects. To address this limitation, researchers introduce open-set perception models to detect or segment arbitrary test-time user-input categories. However, open-set models rely on human involvement to provide predefined object categories as input during inference. More recently, researchers have framed a more realistic and challenging task known as open-ended perception that aims to discover unseen objects without requiring any category-level input from humans at inference time. Nevertheless, open-ended models suffer from low performance compared to open-set models. In this paper, we present VL-SAM-V2, an open-world object detection framework that is capable of discovering unseen objects while achieving favorable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
