3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection
Jongwoo Park, Apoorv Singh, Varun Bankiti

TL;DR
The paper introduces 3M3D, a novel 3D object detection method that updates multi-view and query features through multi-attention mechanisms, significantly improving performance on autonomous driving benchmarks.
Contribution
It proposes a multi-view, multi-path, multi-representation framework that enhances scene understanding by updating features with self-attention and multi-representation queries.
Findings
Improves 3D detection accuracy on nuScenes dataset.
Enhances global and local scene understanding through multi-view feature updates.
Achieves performance gains over baseline models.
Abstract
3D visual perception tasks based on multi-camera images are essential for autonomous driving systems. Latest work in this field performs 3D object detection by leveraging multi-view images as an input and iteratively enhancing object queries (object proposals) by cross-attending multi-view features. However, individual backbone features are not updated with multi-view features and it stays as a mere collection of the output of the single-image backbone network. Therefore we propose 3M3D: A Multi-view, Multi-path, Multi-representation for 3D Object Detection where we update both multi-view features and query features to enhance the representation of the scene in both fine panoramic view and coarse global view. Firstly, we update multi-view features by multi-view axis self-attention. It will incorporate panoramic information in the multi-view features and enhance understanding of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
