Towards Practical Real-Time Neural Video Compression

Zhaoyang Jia; Bin Li; Jiahao Li; Wenxuan Xie; Linfeng Qi; Houqiang Li,; Yan Lu

arXiv:2502.20762·eess.IV·March 19, 2025

Towards Practical Real-Time Neural Video Compression

Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li,, Yan Lu

PDF

1 Repo

TL;DR

This paper presents a real-time neural video codec that achieves high compression efficiency and fast processing speeds by minimizing operational costs through innovative design choices, making neural video compression more practical.

Contribution

The paper introduces efficiency-driven design improvements, including implicit temporal modeling and low-resolution representations, to significantly accelerate neural video coding without quality loss.

Findings

01

Achieves 125.2 fps encoding speed for 1080p video

02

Saves 21% bitrate compared to H.266/VTM

03

Maintains high compression quality with reduced operational costs

Abstract

We introduce a practical real-time neural video codec (NVC) designed to deliver high compression ratio, low latency and broad versatility. In practice, the coding speed of NVCs depends on 1) computational costs, and 2) non-computational operational costs, such as memory I/O and the number of function calls. While most efficient NVCs prioritize reducing computational cost, we identify operational cost as the primary bottleneck to achieving higher coding speed. Leveraging this insight, we introduce a set of efficiency-driven design improvements focused on minimizing operational costs. Specifically, we employ implicit temporal modeling to eliminate complex explicit motion modules, and use single low-resolution latent representations rather than progressive downsampling. These innovations significantly accelerate NVC without sacrificing compression quality. Additionally, we implement model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/dcvc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sparse Evolutionary Training