Multimodal Virtual Point 3D Detection
Tianwei Yin, Xingyi Zhou, Philipp Kr\"ahenb\"uhl

TL;DR
This paper introduces a multimodal fusion method that combines RGB camera detections with Lidar data to generate virtual points, significantly enhancing 3D object detection accuracy in autonomous vehicles.
Contribution
The authors propose a novel approach to fuse RGB detections into Lidar point clouds by creating dense virtual points, improving detection performance over existing methods.
Findings
Improves CenterPoint baseline by 6.6 mAP on nuScenes
Outperforms existing fusion approaches
Effective integration of RGB detections into Lidar-based 3D detection
Abstract
Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current Lidar sensors still lag two decades behind traditional color cameras in terms of resolution and cost. For autonomous driving, this means that large objects close to the sensors are easily visible, but far-away or small objects comprise only one measurement or two. This is an issue, especially when these objects turn out to be driving hazards. On the other hand, these same objects are clearly visible in onboard RGB sensors. In this work, we present an approach to seamlessly fuse RGB sensors into Lidar-based 3D recognition. Our approach takes a set of 2D detections to generate dense 3D virtual points to augment an otherwise sparse 3D point cloud. These virtual points naturally integrate into any standard Lidar-based 3D detectors along with regular Lidar measurements. The resulting multi-modal detector…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Optical Sensing Technologies · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
