TL;DR
SparseCoop introduces a sparse, geometry-aware cooperative perception framework for autonomous driving that improves accuracy, efficiency, and robustness without relying on dense BEV features.
Contribution
It proposes a novel kinematic-grounded query, a robust fusion module, and a denoising training task, advancing sparse cooperative perception methods.
Findings
Achieves state-of-the-art results on V2X-Seq and Griffin datasets.
Offers superior computational efficiency and low transmission costs.
Demonstrates robustness to communication latency.
Abstract
Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic-grounded instance query that uses an explicit state vector with 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
