MMSense: Adapting Vision-based Foundation Model for Multi-task Multi-modal Wireless Sensing
Zhizhen Li, Xuanhao Luo, Xueren Ge, Longyu Zhou, Xingqin Lin, Yuchen Liu

TL;DR
MMSense introduces a multi-modal, multi-task foundation model for wireless sensing that integrates various sensor data types into a unified framework, enabling improved performance and generalization across diverse sensing tasks.
Contribution
The paper presents MMSense, a novel multi-modal foundation model that unifies different sensor data and tasks in wireless sensing, leveraging vision-compatible representations and adaptive fusion mechanisms.
Findings
Outperforms task-specific and large-model baselines on real datasets.
Demonstrates strong generalization across heterogeneous sensing tasks.
Effectively integrates image, radar, LiDAR, and textual data for wireless sensing.
Abstract
Large AI models have been widely adopted in wireless communications for channel modeling, beamforming, and resource optimization. However, most existing efforts remain limited to single-modality inputs and channel-specific objec- tives, overlooking the broader potential of large foundation models for unified wireless sensing. To bridge this gap, we propose MMSense, a multi-modal, multi-task foundation model that jointly addresses channel-centric, environment-aware, and human-centered sensing. Our framework integrates image, radar, LiDAR, and textual data by transforming them into vision- compatible representations, enabling effective cross-modal align- ment within a unified feature space. A modality gating mecha- nism adaptively fuses these representations, while a vision-based large language model backbone enables unified feature align- ment and instruction-driven task adaptation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndoor and Outdoor Localization Technologies · Speech and Audio Processing · Wireless Signal Modulation Classification
