Versatile Learned Video Compression
Runsen Feng, Zongyu Guo, Zhizheng Zhang, Zhibo Chen

TL;DR
This paper introduces VLVC, a versatile learned video compression framework supporting all prediction modes with a unified model, outperforming traditional standards in MS-SSIM by leveraging multi-voxel flows and flow prediction.
Contribution
The paper presents a novel VLVC framework that supports multiple inter prediction modes with a single model, using multi-voxel flows and polynomial flow prediction to improve efficiency and performance.
Findings
VLVC outperforms VVC/H.266 in MS-SSIM.
Supports various prediction modes with one model.
Reduces transmission cost via flow prediction.
Abstract
Learned video compression methods have demonstrated great promise in catching up with traditional video codecs in their rate-distortion (R-D) performance. However, existing learned video compression schemes are limited by the binding of the prediction mode and the fixed network framework. They are unable to support various inter prediction modes and thus inapplicable for various scenarios. In this paper, to break this limitation, we propose a versatile learned video compression (VLVC) framework that uses one model to support all possible prediction modes. Specifically, to realize versatile compression, we first build a motion compensation module that applies multiple 3D motion vector fields (i.e., voxel flows) for weighted trilinear warping in spatial-temporal space. The voxel flows convey the information of temporal reference position that helps to decouple inter prediction modes away…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Advanced Vision and Imaging · Advanced Image Processing Techniques
