A Multi-Scale Spatial-Temporal Network for Wireless Video Transmission
Xinyi Zhou, Danlan Huang, Zhixin Qi, Liang Zhang, and Ting Jiang

TL;DR
This paper introduces a novel multi-scale vision Transformer-based DeepJSCC method for wireless video transmission, effectively capturing spatial-temporal features and enabling content-adaptive coding, outperforming traditional schemes.
Contribution
The paper presents a new VDJSCC approach with a multi-scale Transformer encoder-decoder and dynamic token selection for efficient, adaptive wireless video transmission.
Findings
Outperforms digital schemes in reconstruction quality
Reduces bandwidth compared to existing DeepJSCC methods
Enables content-adaptive variable-length coding
Abstract
Deep joint source-channel coding (DeepJSCC) has shown promise in wireless transmission of text, speech, and images within the realm of semantic communication. However, wireless video transmission presents greater challenges due to the difficulty of extracting and compactly representing both spatial and temporal features, as well as its significant bandwidth and computational resource requirements. In response, we propose a novel video DeepJSCC (VDJSCC) approach to enable end-to-end video transmission over a wireless channel. Our approach involves the design of a multi-scale vision Transformer encoder and decoder to effectively capture spatial-temporal representations over long-term frames. Additionally, we propose a dynamic token selection module to mask less semantically important tokens from spatial or temporal dimensions, allowing for content-adaptive variable-length video coding by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTelecommunications and Broadcasting Technologies · Multimedia Communication and Technology · Video Coding and Compression Technologies
MethodsAttention Is All You Need · Adam · Residual Connection · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · Vision Transformer · Multi-Head Attention · Dense Connections · Label Smoothing
