VLF-MSC: Vision-Language Feature-Based Multimodal Semantic Communication System

Gwangyeon Ahn; Jiwan Seo; Joonhyuk Kang

arXiv:2511.10074·cs.CV·November 14, 2025

VLF-MSC: Vision-Language Feature-Based Multimodal Semantic Communication System

Gwangyeon Ahn, Jiwan Seo, Joonhyuk Kang

PDF

Open Access

TL;DR

VLF-MSC introduces a unified semantic communication system using vision-language features to efficiently transmit and generate both images and text, improving robustness and spectral efficiency over existing methods.

Contribution

The paper presents a novel unified system that encodes and transmits a single vision-language feature for both image and text generation, unlike prior modality-specific approaches.

Findings

01

Outperforms text-only and image-only baselines in semantic accuracy

02

Achieves higher robustness to channel noise

03

Reduces bandwidth requirements significantly

Abstract

We propose Vision-Language Feature-based Multimodal Semantic Communication (VLF-MSC), a unified system that transmits a single compact vision-language representation to support both image and text generation at the receiver. Unlike existing semantic communication techniques that process each modality separately, VLF-MSC employs a pre-trained vision-language model (VLM) to encode the source image into a vision-language semantic feature (VLF), which is transmitted over the wireless channel. At the receiver, a decoder-based language model and a diffusion-based image generator are both conditioned on the VLF to produce a descriptive text and a semantically aligned image. This unified representation eliminates the need for modality-specific streams or retransmissions, improving spectral efficiency and adaptability. By leveraging foundation models, the system achieves robustness to channel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Signal Modulation Classification · Multimodal Machine Learning Applications · Advanced Wireless Communication Technologies