Building 6G Radio Foundation Models with Transformer Architectures
Ahmed Aboulfotouh, Ashkan Eshaghbeigi, Hatem Abou-Zeid

TL;DR
This paper introduces a Vision Transformer-based foundation model for radio spectrogram analysis in 6G networks, demonstrating effective self-supervised pretraining that outperforms larger models and generalizes well across tasks.
Contribution
It presents a novel Masked Spectrogram Modeling approach for pretraining ViT models in wireless communications, enabling scalable and adaptable foundation models for 6G.
Findings
Pretrained ViT outperforms larger models on spectrogram segmentation.
Model generalizes effectively across diverse domains.
Requires less training time than training from scratch.
Abstract
Foundation deep learning (DL) models are general models, designed to learn general, robust and adaptable representations of their target modality, enabling finetuning across a range of downstream tasks. These models are pretrained on large, unlabeled datasets using self-supervised learning (SSL). Foundation models have demonstrated better generalization than traditional supervised approaches, a critical requirement for wireless communications where the dynamic environment demands model adaptability. In this work, we propose and demonstrate the effectiveness of a Vision Transformer (ViT) as a radio foundation model for spectrogram learning. We introduce a Masked Spectrogram Modeling (MSM) approach to pretrain the ViT in a self-supervised fashion. We evaluate the ViT-based foundation model on two downstream tasks: Channel State Information (CSI)-based Human Activity sensing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced MIMO Systems Optimization · Microwave Engineering and Waveguides · Power Line Communications and Noise
MethodsAttention Is All You Need · Adam · Residual Connection · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · Vision Transformer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer
