Seeing Radio: From Zero RF Priors to Explainable Modulation Recognition with Vision Language Models

Hang Zou; Bohao Wang; Yu Tian; Lina Bariah; Chongwen Huang; Samson Lasaulce; M\'erouane Debbah

arXiv:2601.13157·eess.SP·February 17, 2026

Seeing Radio: From Zero RF Priors to Explainable Modulation Recognition with Vision Language Models

Hang Zou, Bohao Wang, Yu Tian, Lina Bariah, Chongwen Huang, Samson Lasaulce, M\'erouane Debbah

PDF

Open Access

TL;DR

This paper demonstrates that vision-language models can be adapted to RF signal analysis by converting signals into visual formats, enabling accurate, explainable modulation recognition without specialized architectures.

Contribution

It introduces a novel RF-to-image conversion method and fine-tuning approach for VLMs, achieving high accuracy and interpretability in modulation classification tasks.

Findings

01

Accuracy improved from 10% to nearly 90% with fine-tuning.

02

Models show robustness to noise and unseen modulations.

03

Provides human-readable rationales for decisions.

Abstract

Current RF machine-learning pipelines rely on task-specific deep networks for modulation classification and related tasks, but these models require custom architectures and labeled datasets for each problem, generalize poorly across channel conditions and SNRs, and offer little interpretability. In contrast, modern multimodal large language models (MLLMs) can integrate heterogeneous visual and textual data and exhibit strong cross-domain generalization and explanation capabilities. Our goal in this work is to explore whether vision-language models (VLMs) can be adapted to directly perceive RF signals and reason about modulation patterns without redesigning their architectures or injecting RF-specific inductive biases. To achieve this, we convert complex IQ streams into time-series, spectrogram, and joint RF visualizations, build a 57-class RF visual question answering benchmark, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Signal Modulation Classification · Multimodal Machine Learning Applications · Adversarial Robustness in Machine Learning