Object-Attribute-Relation Model Driven Adaptive Hierarchical Transmission for Multimodal Semantic Communication

Chenxing Li; Yiping Duan; Han Jiao; Xiaoming Tao; Weiyao Lin; Mingquan Lu

arXiv:2604.07859·eess.SP·April 10, 2026

Object-Attribute-Relation Model Driven Adaptive Hierarchical Transmission for Multimodal Semantic Communication

Chenxing Li, Yiping Duan, Han Jiao, Xiaoming Tao, Weiyao Lin, Mingquan Lu

PDF

TL;DR

This paper introduces a novel adaptive hierarchical transmission framework for multimodal semantic communication that significantly reduces bandwidth and latency while maintaining scene understanding under challenging channel conditions.

Contribution

It proposes an Object-Attribute-Relation hierarchy that bypasses pixel reconstruction, enabling robust, decision-oriented multimodal transmission with adaptive resource allocation and cross-modal compensation.

Findings

01

Achieves over 90% bandwidth savings at 1-3 kbps compared to state-of-the-art methods.

02

Eliminates cliff effects in deep fading channels, ensuring graceful degradation.

03

Reduces end-to-end latency by 89%.

Abstract

Traditional video coding (VVC, HEVC) prioritizes human visual perception, transmitting substantial texture redundancy that severely hinders machine decision-making under constrained bandwidths. In dynamic channels, this redundancy causes severe ``cliff effects'' and prohibitive latency. To address this, we propose a robust multimodal semantic communication framework based on an adaptive Object-Attribute-Relation (O-A-R) hierarchy. Bypassing pixel-level reconstruction entirely, our framework directly fuses visual, textual, and audio streams to construct a decision-oriented topological graph. A bandwidth-adaptive strategy dynamically allocates resources by semantic priority, while a cross-modal mechanism leverages text and audio priors to compensate for severe visual degradation. Experimental results demonstrate that under extreme low bandwidths (1-3 kbps), our method achieves over a 90%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.