V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection
Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang,, Han Hu, Nanning Zheng, Baining Guo

TL;DR
V-DETR introduces a novel 3D vertex relative position encoding to improve 3D object detection in point clouds, significantly outperforming previous methods on benchmark datasets.
Contribution
The paper proposes a new 3D vertex relative position encoding method that enhances the DETR framework for better locality and accuracy in 3D object detection.
Findings
Achieves 77.8% AP25 on ScanNetV2, surpassing previous methods.
Sets a new record on ScanNetV2 and SUN RGB-D datasets.
Demonstrates significant performance improvements with the proposed encoding.
Abstract
We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address the limitation, we introduce a novel 3D Vertex Relative Position Encoding (3DV-RPE) method which computes position encoding for each point based on its relative position to the 3D boxes predicted by the queries in each decoder layer, thus providing clear information to guide the model to focus on points near the objects, in accordance with the principle of locality. In addition, we systematically improve the pipeline from various aspects such as data normalization based on our understanding of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage
MethodsMulti-Head Attention · Attention Is All You Need · fail · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections
