V-DETR: DETR with Vertex Relative Position Encoding for 3D Object   Detection

Yichao Shen; Zigang Geng; Yuhui Yuan; Yutong Lin; Ze Liu; Chunyu Wang,; Han Hu; Nanning Zheng; Baining Guo

arXiv:2308.04409·cs.CV·August 9, 2023·5 cites

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang,, Han Hu, Nanning Zheng, Baining Guo

PDF

Open Access 1 Repo 1 Video

TL;DR

V-DETR introduces a novel 3D vertex relative position encoding to improve 3D object detection in point clouds, significantly outperforming previous methods on benchmark datasets.

Contribution

The paper proposes a new 3D vertex relative position encoding method that enhances the DETR framework for better locality and accuracy in 3D object detection.

Findings

01

Achieves 77.8% AP25 on ScanNetV2, surpassing previous methods.

02

Sets a new record on ScanNetV2 and SUN RGB-D datasets.

03

Demonstrates significant performance improvements with the proposed encoding.

Abstract

We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address the limitation, we introduce a novel 3D Vertex Relative Position Encoding (3DV-RPE) method which computes position encoding for each point based on its relative position to the 3D boxes predicted by the queries in each decoder layer, thus providing clear information to guide the model to focus on points near the objects, in accordance with the principle of locality. In addition, we systematically improve the pipeline from various aspects such as data normalization based on our understanding of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yichaoshen-ms/v-detr
pytorchOfficial

Videos

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage

MethodsMulti-Head Attention · Attention Is All You Need · fail · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections