Co-Fusion4D: Spatio-temporal Collaborative Fusion for Robust 3D Object Detection
Wenxuan Li, Qin Zou, Shoubing Chen, Chi Chen, Yingyi Yang, Shoubing Chen, Qingxiang Meng

TL;DR
Co-Fusion4D introduces a spatio-temporal fusion framework for robust 3D object detection in autonomous driving, emphasizing consistency and alignment of multi-frame features to improve detection accuracy.
Contribution
It proposes a novel unified framework with a current-frame-centric strategy and a Dual Attention Fusion module to enhance temporal stability and feature discrimination.
Findings
Achieves 74.9% mAP and 75.6% NDS on nuScenes benchmark.
Effectively suppresses feature drift and misalignment across frames.
Outperforms previous state-of-the-art methods without external data.
Abstract
In autonomous driving, 3D object detection is essential for accurate perception and reliable decision-making. However, object motion and ego-motion often induce cross-frame spatiotemporal inconsistencies in BEV-based detectors, leading to temporal BEV feature misalignment and degraded spatiotemporal consistency. To address these challenges, we propose Co-Fusion4D, a unified framework that explicitly preserves cross-frame spatiotemporal consistency and suppresses temporal feature drift. Co-Fusion4D adopts a current-frame-centric strategy, treating the current frame as the primary source of information while selectively incorporating historical frames after spatiotemporal filtering and alignment. This dominant-complementary mechanism effectively mitigates cumulative alignment errors, suppresses noisy feature propagation, and exploits reliable temporal cues for a more consistent BEV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
