DiffVC-RT: Towards Practical Real-Time Diffusion-based Perceptual Neural Video Compression
Wenzhuo Ma, Zhenzhong Chen

TL;DR
DiffVC-RT introduces a real-time diffusion-based neural video compression framework that significantly reduces bitrate while maintaining high speed and temporal consistency, addressing key deployment challenges.
Contribution
The paper presents the first real-time diffusion-based perceptual neural video compression framework with an efficient architecture, consistency modeling, and asynchronous decoding pipeline.
Findings
Achieves 80.1% bitrate savings over VTM-17.0 in LPIPS metric.
Operates at 206 fps encoding and 30 fps decoding for 720p videos.
Demonstrates practical real-time performance on NVIDIA H800 GPU.
Abstract
The practical deployment of diffusion-based Neural Video Compression (NVC) faces critical challenges, including severe information loss, prohibitive inference latency, and poor temporal consistency. To bridge this gap, we propose DiffVC-RT, the first framework designed to achieve real-time diffusion-based perceptual NVC. First, we introduce an Efficient and Informative Model Architecture. Through strategic module replacements and pruning, this architecture significantly reduces computational complexity while mitigating structural information loss. Second, to address generative flickering artifacts, we propose Explicit and Implicit Consistency Modeling. We enhance temporal consistency by explicitly incorporating a zero-cost Online Temporal Shift Module within the U-Net, complemented by hybrid implicit consistency constraints. Finally, we present an Asynchronous and Parallel Decoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image and Video Quality Assessment · Video Coding and Compression Technologies
