WiFo-MiSAC: A Wireless Foundation Model for Multimodal Sensing and Communication Integration via Synesthesia of Machines (SoM)

Xuanyu Liu; Shijian Gao; Boxun Liu; Xiang Cheng; Liuqing Yang

arXiv:2604.18255·eess.SP·April 21, 2026

WiFo-MiSAC: A Wireless Foundation Model for Multimodal Sensing and Communication Integration via Synesthesia of Machines (SoM)

Xuanyu Liu, Shijian Gao, Boxun Liu, Xiang Cheng, Liuqing Yang

PDF

TL;DR

WiFo-MiSAC is a versatile foundation model that unifies heterogeneous wireless signals for improved sensing and communication tasks through self-supervised learning and disentangled representations.

Contribution

It introduces a task-agnostic model with a shared-specific disentangled architecture for multimodal wireless data processing, enabling better generalization and integration.

Findings

01

Achieves state-of-the-art results in beam prediction and channel estimation.

02

Demonstrates robust few-shot adaptation to new modalities.

03

Facilitates seamless multimodal sensing and communication integration.

Abstract

Current learning-based wireless methods struggle with generalization due to the fragmented processing of communication and sensing data. WiFo-MiSAC addresses this as a task-agnostic foundation model that tokenizes heterogeneous signals into a unified space for self-supervised pre-training. A shared-specific disentangled mixture-of-experts (SS-DMoE) architecture is employed to decouple modality-shared and modality-specific representations, facilitating interaction without cross-modal interference. By combining masked reconstruction with contrastive alignment, the model achieves state-of-the-art performance across downstream tasks, including beam prediction and channel estimation. Experimental results demonstrate robust few-shot adaptation and seamless integration of new modalities, positioning WiFo-MiSAC as a scalable backbone for future integrated sensing and communication systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.