OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via   Cycle-Modality Propagation

Zhenyu Wang; Yali Li; Taichi Liu; Hengshuang Zhao; Shengjin Wang

arXiv:2403.19580·cs.CV·July 24, 2024·1 cites

OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation

Zhenyu Wang, Yali Li, Taichi Liu, Hengshuang Zhao, Shengjin Wang

PDF

Open Access 1 Repo

TL;DR

OV-Uni3DETR introduces a unified open-vocabulary 3D detection framework that bridges 2D and 3D modalities, enabling detection of seen and unseen classes across diverse scenes and sensor inputs, with state-of-the-art results.

Contribution

It proposes cycle-modality propagation to unify 2D and 3D data, supporting open-vocabulary detection, modality switching, and scene diversity in a single architecture.

Findings

01

Achieves over 6% improvement on various scenarios.

02

Performs comparably or better than point cloud methods using only RGB images.

03

State-of-the-art performance on multiple benchmarks.

Abstract

In the current state of 3D object detection research, the severe scarcity of annotated 3D data, substantial disparities across different data modalities, and the absence of a unified architecture, have impeded the progress towards the goal of universality. In this paper, we propose \textbf{OV-Uni3DETR}, a unified open-vocabulary 3D detector via cycle-modality propagation. Compared with existing 3D detectors, OV-Uni3DETR offers distinct advantages: 1) Open-vocabulary 3D detection: During training, it leverages various accessible data, especially extensive 2D detection images, to boost training diversity. During inference, it can detect both seen and unseen classes. 2) Modality unifying: It seamlessly accommodates input data from any given modality, effectively addressing scenarios involving disparate modalities or missing sensor information, thereby supporting test-time modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenyuw16/uni3detr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Handwritten Text Recognition Techniques · Image Processing and 3D Reconstruction