Multimodal Radio and Vision Fusion for Robust Localization in Urban V2I Communications
Can Zheng, Jiguang He, Chung G. Kang, Guofa Cai, Henk Wymeersch

TL;DR
This paper introduces a multimodal fusion framework combining wireless channel data and visual information to improve vehicle localization accuracy in urban V2I communication, overcoming GPS limitations.
Contribution
It proposes a novel contrastive learning regression model that fuses CSI and visual data for robust urban vehicle localization, outperforming traditional methods.
Findings
Significantly improves localization accuracy in urban environments.
Outperforms traditional and single-modal models in simulations.
Demonstrates robustness against urban signal obstructions.
Abstract
Accurate localization is critical for vehicle-to-infrastructure (V2I) communication systems, especially in urban areas where GPS signals are often obstructed by tall buildings, leading to significant positioning errors, necessitating alternative or complementary techniques for reliable and precise positioning in applications like autonomous driving and smart city infrastructure. This paper proposes a multimodal contrastive learning regression based localization framework for V2I scenarios that combines channel state information (CSI) with visual information to achieve improved accuracy and reliability. The approach leverages the complementary strengths of wireless and visual data to overcome the limitations of traditional localization methods, offering a robust solution for V2I applications. Simulation results demonstrate that the proposed CSI and vision fusion model significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
