JEPA-MSAC: A Joint-Embedding Predictive Architecture for Multimodal Sensing-Assisted Communications
Can Zheng, Jiguang He, Guofa Cai, Nannan Li, Mehdi Bennis, Henk Wymeersch, Merouane Debbah

TL;DR
JEPA-MSAC introduces a self-supervised, multimodal predictive framework for wireless environments that supports multiple PHY tasks with low adaptation, enhancing transferability and efficiency.
Contribution
It proposes a novel joint-embedding predictive architecture that learns environment dynamics and cross-modal dependencies for multi-task wireless communication applications.
Findings
Supports accurate multi-task prediction with low adaptation cost
Latent state effectively captures environment dynamics and cross-modal dependencies
Ablation studies show the importance of pretraining setups
Abstract
Future wireless systems increasingly require predictive and transferable representations that can support multiple physical-layer (PHY) tasks under dynamic environments. However, most existing supervised learning-based methods are designed for a single task, which leads to high adaptation cost. To address this issue, we propose a joint-embedding predictive architecture for multimodal sensing-assisted communications (JEPA-MSAC), a self-supervised multimodal predictive representation learning framework for wireless environments. The proposed framework first maps multimodal sensing and communication measurements into a unified token space, and then pretrains a shared backbone using temporal block-masked JEPA to learn a predictive latent space that captures environment dynamics and cross-modal dependencies. After pretraining, the backbone is frozen and reused as a general future-feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
