Multi-Modal Semantic Communication

Matin Mortaheb; Erciyes Karakaya; Sennur Ulukus

arXiv:2512.15691·cs.LG·December 18, 2025

Multi-Modal Semantic Communication

Matin Mortaheb, Erciyes Karakaya, Sennur Ulukus

PDF

Open Access

TL;DR

This paper introduces a multi-modal semantic communication system that uses text queries to guide image data transmission, optimizing information relevance and bandwidth efficiency in complex scenes.

Contribution

It presents a novel framework integrating cross-modal attention and adaptive image patch transmission guided by user queries and channel capacity.

Findings

01

Effective task-relevant information transmission in complex scenes.

02

Adaptive resolution transmission based on bandwidth constraints.

03

Improved communication efficiency over traditional methods.

Abstract

Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for applications such as telepresence, augmented reality, and remote sensing. Recent transformer-based approaches have used self-attention maps to identify informative regions within images, but they often struggle in complex scenes with multiple objects, where self-attention lacks explicit task guidance. To address this, we propose a novel Multi-Modal Semantic Communication framework that integrates text-based user queries to guide the information extraction process. Our proposed system employs a cross-modal attention mechanism that fuses visual features with language embeddings to produce soft relevance scores over the visual data. Based on these scores and the instantaneous channel bandwidth, we use an algorithm to transmit image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Wireless Signal Modulation Classification · Advanced Wireless Communication Technologies