Multimodal generative semantic communication based on latent diffusion model
Weiqi Fu, Lianming Xu, Xin Wu, Haoyang Wei, Li Wang

TL;DR
This paper presents mm-GESCO, a multimodal generative semantic communication framework that fuses visible and infrared data, achieving high compression and improved accuracy in environmental understanding tasks.
Contribution
The paper introduces a novel multimodal semantic communication framework using latent diffusion models and contrastive learning for data fusion and reconstruction.
Findings
Achieves up to 200x data compression ratio.
Outperforms existing semantic communication methods.
Enhances downstream task performance like classification and detection.
Abstract
In emergencies, the ability to quickly and accurately gather environmental data and command information, and to make timely decisions, is particularly critical. Traditional semantic communication frameworks, primarily based on a single modality, are susceptible to complex environments and lighting conditions, thereby limiting decision accuracy. To this end, this paper introduces a multimodal generative semantic communication framework named mm-GESCO. The framework ingests streams of visible and infrared modal image data, generates fused semantic segmentation maps, and transmits them using a combination of one-hot encoding and zlib compression techniques to enhance data transmission efficiency. At the receiving end, the framework can reconstruct the original multimodal images based on the semantic maps. Additionally, a latent diffusion model based on contrastive learning is designed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsLatent Diffusion Model · Diffusion · Contrastive Learning · ALIGN
