Spatial-Temporal Transformer based Video Compression Framework

Yanbo Gao; Wenjia Huang; Shuai Li; Hui Yuan; Mao Ye; Siwei Ma

arXiv:2309.11913·eess.IV·September 22, 2023·1 cites

Spatial-Temporal Transformer based Video Compression Framework

Yanbo Gao, Wenjia Huang, Shuai Li, Hui Yuan, Mao Ye, Siwei Ma

PDF

Open Access

TL;DR

This paper introduces a novel spatial-temporal transformer framework for video compression that improves motion estimation and residual coding, achieving significant bitrate savings over existing methods.

Contribution

It proposes a unified transformer-based framework with specialized modules for motion estimation, multi-reference prediction, and residual compression, addressing stability and efficiency issues in learned video compression.

Findings

01

Achieves 13.5% BD-Rate saving over VTM

02

Demonstrates stable motion estimation with RDT

03

Enhances residual compression efficiency

Abstract

Learned video compression (LVC) has witnessed remarkable advancements in recent years. Similar as the traditional video coding, LVC inherits motion estimation/compensation, residual coding and other modules, all of which are implemented with neural networks (NNs). However, within the framework of NNs and its training mechanism using gradient backpropagation, most existing works often struggle to consistently generate stable motion information, which is in the form of geometric features, from the input color features. Moreover, the modules such as the inter-prediction and residual coding are independent from each other, making it inefficient to fully reduce the spatial-temporal redundancy. To address the above problems, in this paper, we propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework. It contains a Relaxed Deformable Transformer (RDT) with Uformer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Advanced Data Compression Techniques

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Byte Pair Encoding · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection · Adam