HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors
Chuheng Wei, Ziye Qin, Walter Zimmer, Guoyuan Wu, Matthew J. Barth

TL;DR
HeCoFuse is a novel framework for V2X cooperative perception that effectively fuses heterogeneous sensor data using hierarchical attention and adaptive modules, achieving state-of-the-art results across diverse sensor configurations.
Contribution
We introduce HeCoFuse, a unified adaptive fusion framework for heterogeneous sensors in V2X perception, addressing cross-modality misalignment and robustness issues.
Findings
Achieves 43.22% 3D mAP with full sensor setup, outperforming baseline.
Maintains high performance across nine sensor configurations.
First-place in CVPR 2025 DriveX challenge.
Abstract
Real-world Vehicle-to-Everything (V2X) cooperative perception systems often operate under heterogeneous sensor configurations due to cost constraints and deployment variability across vehicles and infrastructure. This heterogeneity poses significant challenges for feature fusion and perception reliability. To address these issues, we propose HeCoFuse, a unified framework designed for cooperative perception across mixed sensor setups where nodes may carry Cameras (C), LiDARs (L), or both. By introducing a hierarchical fusion mechanism that adaptively weights features through a combination of channel-wise and spatial attention, HeCoFuse can tackle critical challenges such as cross-modality feature misalignment and imbalanced representation quality. In addition, an adaptive spatial resolution adjustment module is employed to balance computational cost and fusion effectiveness. To enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
