Zero-Shot Semantic Communication with Multimodal Foundation Models

Jiangjing Hu; Haotian Wu; Wenjing Zhang; Fengyu Wang; Wenjun Xu; Hui Gao; Deniz G\"und\"uz

arXiv:2502.18200·eess.SP·May 30, 2025

Zero-Shot Semantic Communication with Multimodal Foundation Models

Jiangjing Hu, Haotian Wu, Wenjing Zhang, Fengyu Wang, Wenjun Xu, Hui Gao, Deniz G\"und\"uz

PDF

Open Access

TL;DR

This paper introduces SemCLIP, a zero-shot semantic communication framework using foundation models to transmit universal semantic tokens, significantly improving efficiency, robustness, and task generalization in low bandwidth and noisy conditions.

Contribution

The paper presents SemCLIP, a novel zero-shot SemCom system leveraging CLIP for universal semantic token transmission and a prompt learning mechanism for robustness, advancing flexible and efficient semantic communication.

Findings

01

41% improvement in zero-shot performance at low SNR

02

Over 50-fold bandwidth reduction compared to image transmission

03

Enhanced robustness through prompt adaptation

Abstract

Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a zero-shot SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. By transmitting CLIP-generated image tokens instead of raw images, SemCLIP enables efficient SemCom under low bandwidth and challenging channel conditions, facilitating diverse downstream tasks and zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP token encoding. To mitigate potential degradation caused by compression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Signal Modulation Classification · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training