HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors

Chuheng Wei; Ziye Qin; Walter Zimmer; Guoyuan Wu; Matthew J. Barth

arXiv:2507.13677·cs.CV·March 24, 2026

HeCoFuse: Cross-Modal Complementary V2X Cooperative Perception with Heterogeneous Sensors

Chuheng Wei, Ziye Qin, Walter Zimmer, Guoyuan Wu, Matthew J. Barth

PDF

TL;DR

HeCoFuse is a novel framework for V2X cooperative perception that effectively fuses heterogeneous sensor data using hierarchical attention and adaptive modules, achieving state-of-the-art results across diverse sensor configurations.

Contribution

We introduce HeCoFuse, a unified adaptive fusion framework for heterogeneous sensors in V2X perception, addressing cross-modality misalignment and robustness issues.

Findings

01

Achieves 43.22% 3D mAP with full sensor setup, outperforming baseline.

02

Maintains high performance across nine sensor configurations.

03

First-place in CVPR 2025 DriveX challenge.

Abstract

Real-world Vehicle-to-Everything (V2X) cooperative perception systems often operate under heterogeneous sensor configurations due to cost constraints and deployment variability across vehicles and infrastructure. This heterogeneity poses significant challenges for feature fusion and perception reliability. To address these issues, we propose HeCoFuse, a unified framework designed for cooperative perception across mixed sensor setups where nodes may carry Cameras (C), LiDARs (L), or both. By introducing a hierarchical fusion mechanism that adaptively weights features through a combination of channel-wise and spatial attention, HeCoFuse can tackle critical challenges such as cross-modality feature misalignment and imbalanced representation quality. In addition, an adaptive spatial resolution adjustment module is employed to balance computational cost and fusion effectiveness. To enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.