Object-Attribute-Relation Model Driven Adaptive Hierarchical Transmission for Multimodal Semantic Communication
Chenxing Li, Yiping Duan, Han Jiao, Xiaoming Tao, Weiyao Lin, Mingquan Lu

TL;DR
This paper introduces a novel adaptive hierarchical transmission framework for multimodal semantic communication that significantly reduces bandwidth and latency while maintaining scene understanding under challenging channel conditions.
Contribution
It proposes an Object-Attribute-Relation hierarchy that bypasses pixel reconstruction, enabling robust, decision-oriented multimodal transmission with adaptive resource allocation and cross-modal compensation.
Findings
Achieves over 90% bandwidth savings at 1-3 kbps compared to state-of-the-art methods.
Eliminates cliff effects in deep fading channels, ensuring graceful degradation.
Reduces end-to-end latency by 89%.
Abstract
Traditional video coding (VVC, HEVC) prioritizes human visual perception, transmitting substantial texture redundancy that severely hinders machine decision-making under constrained bandwidths. In dynamic channels, this redundancy causes severe ``cliff effects'' and prohibitive latency. To address this, we propose a robust multimodal semantic communication framework based on an adaptive Object-Attribute-Relation (O-A-R) hierarchy. Bypassing pixel-level reconstruction entirely, our framework directly fuses visual, textual, and audio streams to construct a decision-oriented topological graph. A bandwidth-adaptive strategy dynamically allocates resources by semantic priority, while a cross-modal mechanism leverages text and audio priors to compensate for severe visual degradation. Experimental results demonstrate that under extreme low bandwidths (1-3 kbps), our method achieves over a 90%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
