Zero-Shot Semantic Communication with Multimodal Foundation Models
Jiangjing Hu, Haotian Wu, Wenjing Zhang, Fengyu Wang, Wenjun Xu, Hui Gao, Deniz G\"und\"uz

TL;DR
This paper introduces SemCLIP, a zero-shot semantic communication framework using foundation models to transmit universal semantic tokens, significantly improving efficiency, robustness, and task generalization in low bandwidth and noisy conditions.
Contribution
The paper presents SemCLIP, a novel zero-shot SemCom system leveraging CLIP for universal semantic token transmission and a prompt learning mechanism for robustness, advancing flexible and efficient semantic communication.
Findings
41% improvement in zero-shot performance at low SNR
Over 50-fold bandwidth reduction compared to image transmission
Enhanced robustness through prompt adaptation
Abstract
Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a zero-shot SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. By transmitting CLIP-generated image tokens instead of raw images, SemCLIP enables efficient SemCom under low bandwidth and challenging channel conditions, facilitating diverse downstream tasks and zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP token encoding. To mitigate potential degradation caused by compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Signal Modulation Classification · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training
