M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers
Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry, Davis, Dinesh Manocha

TL;DR
M3DeTR introduces a transformer-based architecture that unifies multiple point cloud representations and scales for improved 3D object detection, achieving state-of-the-art results on KITTI and Waymo datasets.
Contribution
It is the first method to combine multiple point cloud representations, feature scales, and mutual relationships simultaneously using transformers for 3D detection.
Findings
Achieves state-of-the-art performance on KITTI and Waymo datasets.
Significantly improves baseline by 1.48% mAP on Waymo.
Ranks 1st on KITTI 3D Detection Benchmark for cars and cyclists.
Abstract
We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids. M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transformers. We perform extensive ablation experiments that highlight the benefits of fusing representation and scale, and modeling the relationships. Our method achieves state-of-the-art performance on the KITTI 3D object detection dataset and Waymo Open Dataset. Results show that M3DeTR improves the baseline significantly by 1.48% mAP for all classes on Waymo Open Dataset. In particular, our approach ranks 1st on the well-known KITTI 3D Detection Benchmark for both car and cyclist classes, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
M3DETR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection
