VPFNet: Improving 3D Object Detection with Virtual Point based LiDAR and Stereo Data Fusion
Hanqi Zhu, Jiajun Deng, Yu Zhang, Jianmin Ji, Qiuyu Mao, Houqiang Li,, Yanyong Zhang

TL;DR
VPFNet introduces a novel virtual point-based fusion architecture that effectively combines LiDAR and stereo data for 3D object detection, achieving state-of-the-art results on KITTI with efficient computation.
Contribution
The paper proposes virtual points to better align and fuse LiDAR and stereo data, overcoming resolution mismatch issues in multi-sensor fusion for 3D detection.
Findings
Achieves 83.21% moderate 3D AP on KITTI
Ranks 1st on KITTI test set since May 2021
Runs at 15 FPS on NVIDIA RTX 2080Ti
Abstract
It has been well recognized that fusing the complementary information from depth-aware LiDAR point clouds and semantic-rich stereo images would benefit 3D object detection. Nevertheless, it is not trivial to explore the inherently unnatural interaction between sparse 3D points and dense 2D pixels. To ease this difficulty, the recent proposals generally project the 3D points onto the 2D image plane to sample the image data and then aggregate the data at the points. However, this approach often suffers from the mismatch between the resolution of point clouds and RGB images, leading to sub-optimal performance. Specifically, taking the sparse points as the multi-modal data aggregation locations causes severe information loss for high-resolution images, which in turn undermines the effectiveness of multi-sensor fusion. In this paper, we present VPFNet -- a new architecture that cleverly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Optical Sensing Technologies
