Multimodal generative semantic communication based on latent diffusion   model

Weiqi Fu; Lianming Xu; Xin Wu; Haoyang Wei; Li Wang

arXiv:2408.05455·cs.CV·August 13, 2024

Multimodal generative semantic communication based on latent diffusion model

Weiqi Fu, Lianming Xu, Xin Wu, Haoyang Wei, Li Wang

PDF

Open Access

TL;DR

This paper presents mm-GESCO, a multimodal generative semantic communication framework that fuses visible and infrared data, achieving high compression and improved accuracy in environmental understanding tasks.

Contribution

The paper introduces a novel multimodal semantic communication framework using latent diffusion models and contrastive learning for data fusion and reconstruction.

Findings

01

Achieves up to 200x data compression ratio.

02

Outperforms existing semantic communication methods.

03

Enhances downstream task performance like classification and detection.

Abstract

In emergencies, the ability to quickly and accurately gather environmental data and command information, and to make timely decisions, is particularly critical. Traditional semantic communication frameworks, primarily based on a single modality, are susceptible to complex environments and lighting conditions, thereby limiting decision accuracy. To this end, this paper introduces a multimodal generative semantic communication framework named mm-GESCO. The framework ingests streams of visible and infrared modal image data, generates fused semantic segmentation maps, and transmits them using a combination of one-hot encoding and zlib compression techniques to enhance data transmission efficiency. At the receiving end, the framework can reconstruct the original multimodal images based on the semantic maps. Additionally, a latent diffusion model based on contrastive learning is designed to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsLatent Diffusion Model · Diffusion · Contrastive Learning · ALIGN