Deep Continuous Fusion for Multi-Sensor 3D Object Detection
Ming Liang, Bin Yang, Shenlong Wang, Raquel Urtasun

TL;DR
This paper introduces a novel end-to-end 3D object detection method that fuses LIDAR and camera data using continuous convolutions, leading to improved accuracy in localization tasks.
Contribution
It presents a new continuous fusion layer and architecture that effectively combines image and LIDAR features at multiple resolutions for enhanced 3D detection.
Findings
Significant improvements over state-of-the-art on KITTI dataset
Effective fusion of image and LIDAR data at multiple resolutions
End-to-end learnable architecture for multi-sensor 3D detection
Abstract
In this paper, we propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization. Towards this goal, we design an end-to-end learnable architecture that exploits continuous convolutions to fuse image and LIDAR feature maps at different levels of resolution. Our proposed continuous fusion layer encode both discrete-state image features as well as continuous geometric information. This enables us to design a novel, reliable and efficient end-to-end learnable 3D object detector based on multiple sensors. Our experimental evaluation on both KITTI as well as a large scale 3D object detection benchmark shows significant improvements over the state of the art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
