Co-Fusion4D: Spatio-temporal Collaborative Fusion for Robust 3D Object Detection

Wenxuan Li; Qin Zou; Shoubing Chen; Chi Chen; Yingyi Yang; Shoubing Chen; Qingxiang Meng

arXiv:2605.20301·cs.CV·May 21, 2026

Co-Fusion4D: Spatio-temporal Collaborative Fusion for Robust 3D Object Detection

Wenxuan Li, Qin Zou, Shoubing Chen, Chi Chen, Yingyi Yang, Shoubing Chen, Qingxiang Meng

PDF

TL;DR

Co-Fusion4D introduces a spatio-temporal fusion framework for robust 3D object detection in autonomous driving, emphasizing consistency and alignment of multi-frame features to improve detection accuracy.

Contribution

It proposes a novel unified framework with a current-frame-centric strategy and a Dual Attention Fusion module to enhance temporal stability and feature discrimination.

Findings

01

Achieves 74.9% mAP and 75.6% NDS on nuScenes benchmark.

02

Effectively suppresses feature drift and misalignment across frames.

03

Outperforms previous state-of-the-art methods without external data.

Abstract

In autonomous driving, 3D object detection is essential for accurate perception and reliable decision-making. However, object motion and ego-motion often induce cross-frame spatiotemporal inconsistencies in BEV-based detectors, leading to temporal BEV feature misalignment and degraded spatiotemporal consistency. To address these challenges, we propose Co-Fusion4D, a unified framework that explicitly preserves cross-frame spatiotemporal consistency and suppresses temporal feature drift. Co-Fusion4D adopts a current-frame-centric strategy, treating the current frame as the primary source of information while selectively incorporating historical frames after spatiotemporal filtering and alignment. This dominant-complementary mechanism effectively mitigates cumulative alignment errors, suppresses noisy feature propagation, and exploits reliable temporal cues for a more consistent BEV…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.