Image Generation with Supervised Selection Based on Multimodal Features for Semantic Communications

Chengyang Liang; Dong Li

arXiv:2411.17428·eess.IV·July 8, 2025·IEEE Trans. Commun.

Image Generation with Supervised Selection Based on Multimodal Features for Semantic Communications

Chengyang Liang, Dong Li

PDF

Open Access

TL;DR

This paper introduces a multimodal semantic communication framework that uses both image and text features to supervise image generation, improving fidelity and robustness over traditional single-modal approaches, especially in noisy conditions.

Contribution

The paper proposes a novel multimodal semantic communication system utilizing CNN and CLIP for feature extraction and a diffusion model for image generation, enhancing semantic fidelity and robustness.

Findings

01

Improved image transmission fidelity compared to existing systems.

02

Enhanced robustness in low SNR environments.

03

Effective multiuser extension maintaining high performance.

Abstract

Semantic communication (SemCom) has emerged as a promising technique for the next-generation communication systems, in which the generation at the receiver side is allowed with semantic features' recovery. However, the majority of existing research predominantly utilizes a singular type of semantic information, such as text, images, or speech, to supervise and choose the generated source signals, which may not sufficiently encapsulate the comprehensive and accurate semantic information, and thus creating a performance bottleneck. In order to bridge this gap, in this paper, we propose and investigate a SemCom framework using multimodal information to supervise the generated image. To be specific, in this framework, we first extract semantic features at both the image and text levels utilizing the Convolutional Neural Network (CNN) architecture and the Contrastive Language-Image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsDiffusion