OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
Zhenyu Wang, Yali Li, Taichi Liu, Hengshuang Zhao, Shengjin Wang

TL;DR
OV-Uni3DETR introduces a unified open-vocabulary 3D detection framework that bridges 2D and 3D modalities, enabling detection of seen and unseen classes across diverse scenes and sensor inputs, with state-of-the-art results.
Contribution
It proposes cycle-modality propagation to unify 2D and 3D data, supporting open-vocabulary detection, modality switching, and scene diversity in a single architecture.
Findings
Achieves over 6% improvement on various scenarios.
Performs comparably or better than point cloud methods using only RGB images.
State-of-the-art performance on multiple benchmarks.
Abstract
In the current state of 3D object detection research, the severe scarcity of annotated 3D data, substantial disparities across different data modalities, and the absence of a unified architecture, have impeded the progress towards the goal of universality. In this paper, we propose \textbf{OV-Uni3DETR}, a unified open-vocabulary 3D detector via cycle-modality propagation. Compared with existing 3D detectors, OV-Uni3DETR offers distinct advantages: 1) Open-vocabulary 3D detection: During training, it leverages various accessible data, especially extensive 2D detection images, to boost training diversity. During inference, it can detect both seen and unseen classes. 2) Modality unifying: It seamlessly accommodates input data from any given modality, effectively addressing scenarios involving disparate modalities or missing sensor information, thereby supporting test-time modality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Handwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
