When Vision-Language Model (VLM) Meets Beam Prediction: A Multimodal Contrastive Learning Framework

Ji Wang; Bin Tang; Jian Xiao; Qimei Cui; Xingwang Li; and Tony Q. S. Quek

arXiv:2508.00456·eess.SP·August 18, 2025

When Vision-Language Model (VLM) Meets Beam Prediction: A Multimodal Contrastive Learning Framework

Ji Wang, Bin Tang, Jian Xiao, Qimei Cui, Xingwang Li, and Tony Q. S. Quek

PDF

Open Access

TL;DR

This paper introduces a multimodal contrastive learning framework leveraging vision-language models to improve millimeter wave beam prediction accuracy in complex environments by integrating image, LiDAR, and location data.

Contribution

The paper proposes a novel VLM-driven contrastive learning framework that aligns multimodal data for enhanced beam prediction, incorporating language prompts for better cross-modal consistency.

Findings

01

Achieved a DBA-Score of 0.9016, a 1.46% improvement over existing methods.

02

Demonstrated the effectiveness of multimodal data integration in complex propagation environments.

03

Validated the approach on the DeepSense-6G dataset.

Abstract

As the real propagation environment becomes in creasingly complex and dynamic, millimeter wave beam prediction faces huge challenges. However, the powerful cross modal representation capability of vision-language model (VLM) provides a promising approach. The traditional methods that rely on real-time channel state information (CSI) are computationally expensive and often fail to maintain accuracy in such environments. In this paper, we present a VLM-driven contrastive learning based multimodal beam prediction framework that integrates multimodal data via modality-specific encoders. To enforce cross-modal consistency, we adopt a contrastive pretraining strategy to align image and LiDAR features in the latent space. We use location information as text prompts and connect it to the text encoder to introduce language modality, which further improves cross-modal consistency. Experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMillimeter-Wave Propagation and Modeling · Wireless Signal Modulation Classification · Advanced Wireless Communication Technologies