Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net
Wenjie Luo, Bin Yang, Raquel Urtasun

TL;DR
This paper introduces a fast, unified deep neural network for real-time 3D detection, tracking, and motion forecasting from 3D sensor data, achieving high accuracy and efficiency in urban environments.
Contribution
A novel end-to-end neural network that jointly performs 3D detection, tracking, and motion forecasting using 3D convolutions on bird's eye view data, enabling real-time performance.
Findings
Outperforms state-of-the-art methods significantly.
Operates in as little as 30 ms for all tasks.
Effective in occlusion and sparse data scenarios.
Abstract
In this paper we propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor. By jointly reasoning about these tasks, our holistic approach is more robust to occlusion as well as sparse data at range. Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world, which is very efficient in terms of both memory and computation. Our experiments on a new very large scale dataset captured in several north american cities, show that we can outperform the state-of-the-art by a large margin. Importantly, by sharing computation we can perform all tasks in as little as 30 ms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
