Sensing and Understanding the World over Air: A Large Multimodal Model for Mobile Networks

Zhuoran Duan; Yuhao Wei; Guoshun Nan; Zijun Wang; Yan Yan; Lihua Xiong; Yuhan Ran; Ji Zhang; Jian Li; Qimei Cui; Xiaofeng Tao; Tony Q. S. Quek

arXiv:2511.21707·cs.NI·December 1, 2025

Sensing and Understanding the World over Air: A Large Multimodal Model for Mobile Networks

Zhuoran Duan, Yuhao Wei, Guoshun Nan, Zijun Wang, Yan Yan, Lihua Xiong, Yuhan Ran, Ji Zhang, Jian Li, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

PDF

Open Access

TL;DR

This paper introduces a large multimodal model tailored for wireless networks that leverages wireless signals as a universal modality, enabling better sensing and understanding of the physical world for smart services.

Contribution

It proposes a novel wireless-native multimodal training paradigm and constructs a GPT-style model trained on real-world data, demonstrating superior performance over existing models.

Findings

01

Wireless signals can serve as a universal modality for multimodal learning.

02

The proposed WMLM outperforms existing small-scale and multi-modal models.

03

The approach validates the feasibility of using wireless signals for large-scale multimodal models.

Abstract

Large models (LMs), such as ChatGPT, have made a significant impact across diverse domains and hold great potential to facilitate the evolution of network intelligence. Wireless-native multi-modal large models (WMLMs) can sense and understand the physical world through multi-modal data, serving as a key enabler that integrates communication, sensing, and intelligence, and thus they can boost various smart services to billions of users. However, research on WMLMs remains in its infancy, and the construction of domain-specific multi-modal large models for wireless networks is still underexplored. In this paper, we outlines the key characteristics of WMLMs and summarizes existing methods, on the basis of which a wireless-native multimodal training paradigm is proposed. Specifically, we constructed a GPT-style WMLM model and trained it on a real-world large-scale dataset, leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndoor and Outdoor Localization Technologies · Speech and Audio Processing · Underwater Vehicles and Communication Systems