TL;DR
This paper introduces 3DETR, a simple yet effective end-to-end Transformer model for 3D object detection in point clouds, outperforming existing methods with minimal modifications and demonstrating versatility across 3D tasks.
Contribution
The paper presents 3DETR, a Transformer-based 3D detection model that requires minimal domain-specific adjustments and achieves state-of-the-art results.
Findings
3DETR outperforms VoteNet by 9.5% on ScanNetV2.
A standard Transformer with non-parametric queries is competitive with specialized architectures.
3DETR is adaptable to other 3D tasks beyond detection.
Abstract
We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds. Compared to existing detection methods that employ a number of 3D-specific inductive biases, 3DETR requires minimal modifications to the vanilla Transformer block. Specifically, we find that a standard Transformer with non-parametric queries and Fourier positional embeddings is competitive with specialized architectures that employ libraries of 3D-specific operators with hand-tuned hyperparameters. Nevertheless, 3DETR is conceptually simple and easy to implement, enabling further improvements by incorporating 3D domain knowledge. Through extensive experiments, we show 3DETR outperforms the well-established and highly optimized VoteNet baselines on the challenging ScanNetV2 dataset by 9.5%. Furthermore, we show 3DETR is applicable to 3D tasks beyond detection, and can serve as a building block…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Softmax · Byte Pair Encoding · Layer Normalization
