Multi-Modal Semantic Communication
Matin Mortaheb, Erciyes Karakaya, Sennur Ulukus

TL;DR
This paper introduces a multi-modal semantic communication system that uses text queries to guide image data transmission, optimizing information relevance and bandwidth efficiency in complex scenes.
Contribution
It presents a novel framework integrating cross-modal attention and adaptive image patch transmission guided by user queries and channel capacity.
Findings
Effective task-relevant information transmission in complex scenes.
Adaptive resolution transmission based on bandwidth constraints.
Improved communication efficiency over traditional methods.
Abstract
Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for applications such as telepresence, augmented reality, and remote sensing. Recent transformer-based approaches have used self-attention maps to identify informative regions within images, but they often struggle in complex scenes with multiple objects, where self-attention lacks explicit task guidance. To address this, we propose a novel Multi-Modal Semantic Communication framework that integrates text-based user queries to guide the information extraction process. Our proposed system employs a cross-modal attention mechanism that fuses visual features with language embeddings to produce soft relevance scores over the visual data. Based on these scores and the instantaneous channel bandwidth, we use an algorithm to transmit image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Wireless Signal Modulation Classification · Advanced Wireless Communication Technologies
