MMSense: Adapting Vision-based Foundation Model for Multi-task Multi-modal Wireless Sensing

Zhizhen Li; Xuanhao Luo; Xueren Ge; Longyu Zhou; Xingqin Lin; Yuchen Liu

arXiv:2511.12305·cs.LG·November 18, 2025

MMSense: Adapting Vision-based Foundation Model for Multi-task Multi-modal Wireless Sensing

Zhizhen Li, Xuanhao Luo, Xueren Ge, Longyu Zhou, Xingqin Lin, Yuchen Liu

PDF

Open Access

TL;DR

MMSense introduces a multi-modal, multi-task foundation model for wireless sensing that integrates various sensor data types into a unified framework, enabling improved performance and generalization across diverse sensing tasks.

Contribution

The paper presents MMSense, a novel multi-modal foundation model that unifies different sensor data and tasks in wireless sensing, leveraging vision-compatible representations and adaptive fusion mechanisms.

Findings

01

Outperforms task-specific and large-model baselines on real datasets.

02

Demonstrates strong generalization across heterogeneous sensing tasks.

03

Effectively integrates image, radar, LiDAR, and textual data for wireless sensing.

Abstract

Large AI models have been widely adopted in wireless communications for channel modeling, beamforming, and resource optimization. However, most existing efforts remain limited to single-modality inputs and channel-specific objec- tives, overlooking the broader potential of large foundation models for unified wireless sensing. To bridge this gap, we propose MMSense, a multi-modal, multi-task foundation model that jointly addresses channel-centric, environment-aware, and human-centered sensing. Our framework integrates image, radar, LiDAR, and textual data by transforming them into vision- compatible representations, enabling effective cross-modal align- ment within a unified feature space. A modality gating mecha- nism adaptively fuses these representations, while a vision-based large language model backbone enables unified feature align- ment and instruction-driven task adaptation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndoor and Outdoor Localization Technologies · Speech and Audio Processing · Wireless Signal Modulation Classification