Multimodal Transformers for Wireless Communications: A Case Study in Beam Prediction
Yu Tian, Qiyang Zhao, Zine el abidine Kherroubi, Fouzi Boukhalfa,, Kebin Wu, Faouzi Bader

TL;DR
This paper introduces a multimodal transformer framework that leverages images, GPS, radar, and LiDAR data to improve beam prediction accuracy in high-frequency wireless communications, demonstrating superior performance over other modalities.
Contribution
The paper presents a novel multimodal transformer deep learning approach for sensing-assisted beam prediction, integrating multiple data sources and advanced training techniques.
Findings
Achieved 78.44% accuracy in beam prediction using image and GPS data.
Demonstrated effective generalization to unseen day and night scenarios.
Outperformed other modalities and data processing techniques in accuracy.
Abstract
Wireless communications at high-frequency bands with large antenna arrays face challenges in beam management, which can potentially be improved by multimodality sensing information from cameras, LiDAR, radar, and GPS. In this paper, we present a multimodal transformer deep learning framework for sensing-assisted beam prediction. We employ a convolutional neural network to extract the features from a sequence of images, point clouds, and radar raw data sampled over time. At each convolutional layer, we use transformer encoders to learn the hidden relations between feature tokens from different modalities and time instances over abstraction space and produce encoded vectors for the next-level feature extraction. We train the model on a combination of different modalities with supervised learning. We try to enhance the model over imbalanced data by utilizing focal loss and exponential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndoor and Outdoor Localization Technologies · Advanced SAR Imaging Techniques · Soil Moisture and Remote Sensing
MethodsGreedy Policy Search · Focal Loss
