Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Yang Cao; Yihan Zeng; Hang Xu; Dan Xu

arXiv:2406.00830·cs.CV·August 5, 2025

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

PDF

Open Access 1 Repo

TL;DR

This paper introduces CoDAv2, a comprehensive framework for open-vocabulary 3D object detection that effectively discovers and classifies novel objects using cross-modal alignment and geometric priors, significantly improving detection performance.

Contribution

The paper presents novel strategies for 3D object discovery and cross-modal alignment, enabling effective detection of unseen categories with limited base data.

Findings

01

Outperforms previous methods with AP_Novel of 9.17 on SUN-RGBD

02

Enables detection of more novel objects through enrichment strategy

03

Achieves significant accuracy improvements in open-vocabulary 3D detection

Abstract

Open-vocabulary 3D Object Detection (OV-3DDet) addresses the detection of objects from an arbitrary list of novel categories in 3D scenes, which remains a very challenging problem. In this work, we propose CoDAv2, a unified framework designed to innovatively tackle both the localization and classification of novel 3D objects, under the condition of limited base categories. For localization, the proposed 3D Novel Object Discovery (3D-NOD) strategy utilizes 3D geometries and 2D open-vocabulary semantic priors to discover pseudo labels for novel objects during training. 3D-NOD is further extended with an Enrichment strategy that significantly enriches the novel object distribution in the training scenes, and then enhances the model's ability to localize more novel objects. The 3D-NOD with Enrichment is termed 3D-NODE. For classification, the Discovery-driven Cross-modal Alignment (DCMA)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangcaoai/CoDA_NeurIPS2023
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsBalanced Selection