Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection
Jinhyung Park, Chenfeng Xu, Shijia Yang, Kurt Keutzer, Kris Kitani,, Masayoshi Tomizuka, Wei Zhan

TL;DR
This paper introduces a novel long-term temporal fusion method for camera-only 3D object detection that significantly improves performance by generating a cost volume from extended image history and combining coarse and fine matching strategies.
Contribution
It proposes a cost volume-based long-term temporal fusion framework that enhances multi-view matching and combines short-term and long-term depth predictions for better 3D detection.
Findings
Achieves state-of-the-art results on nuScenes dataset.
Outperforms previous methods by 5.2% mAP and 3.7% NDS.
Demonstrates the effectiveness of long-term temporal fusion in camera-only 3D detection.
Abstract
While recent camera-only 3D detection methods leverage multiple timesteps, the limited history they use significantly hampers the extent to which temporal fusion can improve object perception. Observing that existing works' fusion of multi-frame images are instances of temporal stereo matching, we find that performance is hindered by the interplay between 1) the low granularity of matching resolution and 2) the sub-optimal multi-view setup produced by limited history usage. Our theoretical and empirical analysis demonstrates that the optimal temporal difference between views varies significantly for different pixels and depths, making it necessary to fuse many timesteps over long-term history. Building on our investigation, we propose to generate a cost volume from a long history of image observations, compensating for the coarse but efficient matching resolution with a more optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsTest
