WiFo-MiSAC: A Wireless Foundation Model for Multimodal Sensing and Communication Integration via Synesthesia of Machines (SoM)
Xuanyu Liu, Shijian Gao, Boxun Liu, Xiang Cheng, Liuqing Yang

TL;DR
WiFo-MiSAC is a versatile foundation model that unifies heterogeneous wireless signals for improved sensing and communication tasks through self-supervised learning and disentangled representations.
Contribution
It introduces a task-agnostic model with a shared-specific disentangled architecture for multimodal wireless data processing, enabling better generalization and integration.
Findings
Achieves state-of-the-art results in beam prediction and channel estimation.
Demonstrates robust few-shot adaptation to new modalities.
Facilitates seamless multimodal sensing and communication integration.
Abstract
Current learning-based wireless methods struggle with generalization due to the fragmented processing of communication and sensing data. WiFo-MiSAC addresses this as a task-agnostic foundation model that tokenizes heterogeneous signals into a unified space for self-supervised pre-training. A shared-specific disentangled mixture-of-experts (SS-DMoE) architecture is employed to decouple modality-shared and modality-specific representations, facilitating interaction without cross-modal interference. By combining masked reconstruction with contrastive alignment, the model achieves state-of-the-art performance across downstream tasks, including beam prediction and channel estimation. Experimental results demonstrate robust few-shot adaptation and seamless integration of new modalities, positioning WiFo-MiSAC as a scalable backbone for future integrated sensing and communication systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
