JEPA-MSAC: A Joint-Embedding Predictive Architecture for Multimodal Sensing-Assisted Communications

Can Zheng; Jiguang He; Guofa Cai; Nannan Li; Mehdi Bennis; Henk Wymeersch; Merouane Debbah

arXiv:2603.29796·eess.SP·April 1, 2026

JEPA-MSAC: A Joint-Embedding Predictive Architecture for Multimodal Sensing-Assisted Communications

Can Zheng, Jiguang He, Guofa Cai, Nannan Li, Mehdi Bennis, Henk Wymeersch, Merouane Debbah

PDF

TL;DR

JEPA-MSAC introduces a self-supervised, multimodal predictive framework for wireless environments that supports multiple PHY tasks with low adaptation, enhancing transferability and efficiency.

Contribution

It proposes a novel joint-embedding predictive architecture that learns environment dynamics and cross-modal dependencies for multi-task wireless communication applications.

Findings

01

Supports accurate multi-task prediction with low adaptation cost

02

Latent state effectively captures environment dynamics and cross-modal dependencies

03

Ablation studies show the importance of pretraining setups

Abstract

Future wireless systems increasingly require predictive and transferable representations that can support multiple physical-layer (PHY) tasks under dynamic environments. However, most existing supervised learning-based methods are designed for a single task, which leads to high adaptation cost. To address this issue, we propose a joint-embedding predictive architecture for multimodal sensing-assisted communications (JEPA-MSAC), a self-supervised multimodal predictive representation learning framework for wireless environments. The proposed framework first maps multimodal sensing and communication measurements into a unified token space, and then pretrains a shared backbone using temporal block-masked JEPA to learn a predictive latent space that captures environment dynamics and cross-modal dependencies. After pretraining, the backbone is frozen and reused as a general future-feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.