A Multi-Scale Spatial-Temporal Network for Wireless Video Transmission

Xinyi Zhou; Danlan Huang; Zhixin Qi; Liang Zhang; and Ting Jiang

arXiv:2411.09936·eess.IV·November 18, 2024·GLOBECOM

A Multi-Scale Spatial-Temporal Network for Wireless Video Transmission

Xinyi Zhou, Danlan Huang, Zhixin Qi, Liang Zhang, and Ting Jiang

PDF

Open Access

TL;DR

This paper introduces a novel multi-scale vision Transformer-based DeepJSCC method for wireless video transmission, effectively capturing spatial-temporal features and enabling content-adaptive coding, outperforming traditional schemes.

Contribution

The paper presents a new VDJSCC approach with a multi-scale Transformer encoder-decoder and dynamic token selection for efficient, adaptive wireless video transmission.

Findings

01

Outperforms digital schemes in reconstruction quality

02

Reduces bandwidth compared to existing DeepJSCC methods

03

Enables content-adaptive variable-length coding

Abstract

Deep joint source-channel coding (DeepJSCC) has shown promise in wireless transmission of text, speech, and images within the realm of semantic communication. However, wireless video transmission presents greater challenges due to the difficulty of extracting and compactly representing both spatial and temporal features, as well as its significant bandwidth and computational resource requirements. In response, we propose a novel video DeepJSCC (VDJSCC) approach to enable end-to-end video transmission over a wireless channel. Our approach involves the design of a multi-scale vision Transformer encoder and decoder to effectively capture spatial-temporal representations over long-term frames. Additionally, we propose a dynamic token selection module to mask less semantically important tokens from spatial or temporal dimensions, allowing for content-adaptive variable-length video coding by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTelecommunications and Broadcasting Technologies · Multimedia Communication and Technology · Video Coding and Compression Technologies

MethodsAttention Is All You Need · Adam · Residual Connection · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · Vision Transformer · Multi-Head Attention · Dense Connections · Label Smoothing