More Pictures Say More: Visual Intersection Network for Open Set Object Detection
Bingcheng Dong, Yuning Ding, Jinrong Zhang, Sifan Zhang, Shenglan Liu

TL;DR
This paper introduces VINO, a DETR-based model that uses a multi-image visual bank to improve open set object detection by capturing semantic intersections across visual prompts, reducing resource needs and enhancing performance.
Contribution
The paper presents a novel visual intersection network that leverages a multi-image visual bank and a visual updating mechanism for improved open set object detection, with less reliance on language modalities.
Findings
VINO achieves competitive performance on LVIS and ODinW35 benchmarks.
Requires only 7 RTX4090 GPU days per epoch, significantly reducing training resources.
Demonstrates broad applicability with a segmentation head extension.
Abstract
Open Set Object Detection has seen rapid development recently, but it continues to pose significant challenges. Language-based methods, grappling with the substantial modal disparity between textual and visual modalities, require extensive computational resources to bridge this gap. Although integrating visual prompts into these frameworks shows promise for enhancing performance, it always comes with constraints related to textual semantics. In contrast, viusal-only methods suffer from the low-quality fusion of multiple visual prompts. In response, we introduce a strong DETR-based model, Visual Intersection Network for Open Set Object Detection (VINO), which constructs a multi-image visual bank to preserve the semantic intersections of each category across all time steps. Our innovative multi-image visual updating mechanism learns to identify the semantic intersections from various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
MethodsSparse Evolutionary Training
