Accelerating Learned Video Compression via Low-Resolution Representation Learning
Zidian Qiu, Zongyao He, Zhi Jin

TL;DR
This paper introduces an efficiency-optimized learned video compression framework that significantly improves encoding and decoding speeds by utilizing low-resolution representations and joint training, achieving competitive compression ratios.
Contribution
The work presents a novel low-resolution representation learning approach that enhances speed and maintains high compression performance in learned video codecs.
Findings
Speeds up encoding by a factor of 3 and decoding by 7 compared to DCVC-HEM.
Decodes 1080p frames in under 100ms on RTX 2080Ti.
Achieves compression performance comparable to H.266 VTM low-decay P configuration.
Abstract
In recent years, the field of learned video compression has witnessed rapid advancement, exemplified by the latest neural video codecs DCVC-DC that has outperformed the upcoming next-generation codec ECM in terms of compression ratio. Despite this, learned video compression frameworks often exhibit low encoding and decoding speeds primarily due to their increased computational complexity and unnecessary high-resolution spatial operations, which hugely hinder their applications in reality. In this work, we introduce an efficiency-optimized framework for learned video compression that focuses on low-resolution representation learning, aiming to significantly enhance the encoding and decoding speeds. Firstly, we diminish the computational load by reducing the resolution of inter-frame propagated features obtained from reused features of decoded frames, including I-frames. We implement a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
