MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data
Ankit Laddha, Shivam Gautam, Stefan Palombo, Shreyash Pandey, Carlos, Vallespi-Gonzalez

TL;DR
MVFuseNet is an end-to-end multi-view fusion method for LiDAR data that enhances object detection and motion forecasting in autonomous driving, achieving state-of-the-art results with real-time performance.
Contribution
It introduces a novel multi-view fusion approach combining range view and bird's eye view for improved spatio-temporal feature learning.
Findings
Achieves state-of-the-art detection and forecasting accuracy.
Scales effectively to large ranges while maintaining real-time speed.
Demonstrates benefits of multi-view fusion over single-view methods.
Abstract
In this work, we propose \textit{MVFuseNet}, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a single view by projecting data in either range view (RV) or bird's eye view (BEV). In contrast, we propose a method that effectively utilizes both RV and BEV for spatio-temporal feature learning as part of a temporal fusion network as well as for multi-scale feature learning in the backbone network. Further, we propose a novel sequential fusion approach that effectively utilizes multiple views in the temporal fusion network. We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets, achieving state-of-the-art results. Furthermore, we show that MVFusenet scales well to large operating ranges while maintaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
