Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection
Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

TL;DR
This paper introduces CoDAv2, a comprehensive framework for open-vocabulary 3D object detection that effectively discovers and classifies novel objects using cross-modal alignment and geometric priors, significantly improving detection performance.
Contribution
The paper presents novel strategies for 3D object discovery and cross-modal alignment, enabling effective detection of unseen categories with limited base data.
Findings
Outperforms previous methods with AP_Novel of 9.17 on SUN-RGBD
Enables detection of more novel objects through enrichment strategy
Achieves significant accuracy improvements in open-vocabulary 3D detection
Abstract
Open-vocabulary 3D Object Detection (OV-3DDet) addresses the detection of objects from an arbitrary list of novel categories in 3D scenes, which remains a very challenging problem. In this work, we propose CoDAv2, a unified framework designed to innovatively tackle both the localization and classification of novel 3D objects, under the condition of limited base categories. For localization, the proposed 3D Novel Object Discovery (3D-NOD) strategy utilizes 3D geometries and 2D open-vocabulary semantic priors to discover pseudo labels for novel objects during training. 3D-NOD is further extended with an Enrichment strategy that significantly enriches the novel object distribution in the training scenes, and then enhances the model's ability to localize more novel objects. The 3D-NOD with Enrichment is termed 3D-NODE. For classification, the Discovery-driven Cross-modal Alignment (DCMA)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsBalanced Selection
