Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su,, Hongsheng Li, Yu Liu

TL;DR
This paper introduces Historical Object Prediction (HoP), a training paradigm that leverages temporal information to improve multi-view 3D object detection accuracy without adding inference overhead.
Contribution
The paper proposes HoP, a training-only method that generates pseudo BEV features from historical frames to enhance detector learning, compatible with existing frameworks.
Findings
HoP achieves state-of-the-art results on nuScenes with 68.5% NDS.
HoP improves detection accuracy when integrated with BEVFormer and BEVDet.
The approach significantly boosts performance without increasing inference complexity.
Abstract
In this paper, we propose a new paradigm, named Historical Object Prediction (HoP) for multi-view 3D detection to leverage temporal information more effectively. The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning. First, we elaborately design short-term and long-term temporal decoders, which can generate the pseudo BEV feature for timestamp t-k without the involvement of its corresponding camera images. Second, an additional object decoder is flexibly attached to predict the object targets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
