TL;DR
Temp-Frustum Net introduces a temporal fusion module that leverages information from previous frames to improve 3D object detection robustness against noise, occlusion, and sparsity in autonomous driving scenarios.
Contribution
It proposes a novel Temporal Fusion Module (TFM) integrated with a frustum network to enhance 3D detection by utilizing temporal information, outperforming frame-by-frame methods.
Findings
Achieves ~6% improvement on Car detection
Achieves ~4% improvement on Pedestrian detection
Achieves ~6% improvement on Cyclist detection
Abstract
3D object detection is a core component of automated driving systems. State-of-the-art methods fuse RGB imagery and LiDAR point cloud data frame-by-frame for 3D bounding box regression. However, frame-by-frame 3D object detection suffers from noise, field-of-view obstruction, and sparsity. We propose a novel Temporal Fusion Module (TFM) to use information from previous time-steps to mitigate these problems. First, a state-of-the-art frustum network extracts point cloud features from raw RGB and LiDAR point cloud data frame-by-frame. Then, our TFM module fuses these features with a recurrent neural network. As a result, 3D object detection becomes robust against single frame failures and transient occlusions. Experiments on the KITTI object tracking dataset show the efficiency of the proposed TFM, where we obtain ~6%, ~4%, and ~6% improvements on Car, Pedestrian, and Cyclist classes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
