Generative Neural Video Compression via Video Diffusion Prior
Qi Mao, Hao Cheng, Tinghan Yang, Libiao Jin, Siwei Ma

TL;DR
GNVC-VD introduces a novel generative neural video compression framework that leverages a video diffusion transformer for sequence-level refinement, significantly improving perceptual quality and temporal coherence at extremely low bitrates.
Contribution
It is the first to unify spatio-temporal latent compression with sequence-level generative refinement using a video diffusion transformer in a single codec.
Findings
Outperforms traditional and learned codecs in perceptual quality.
Reduces flickering artifacts even below 0.01 bpp.
Maintains temporal coherence under extreme bitrate constraints.
Abstract
We present GNVC-VD, the first DiT-based generative neural video compression framework built upon an advanced video generation foundation model, where spatio-temporal latent compression and sequence-level generative refinement are unified within a single codec. Existing perceptual codecs primarily rely on pre-trained image generative priors to restore high-frequency details, but their frame-wise nature lacks temporal modeling and inevitably leads to perceptual flickering. To address this, GNVC-VD introduces a unified flow-matching latent refinement module that leverages a video diffusion transformer to jointly enhance intra- and inter-frame latents through sequence-level denoising, ensuring consistent spatio-temporal details. Instead of denoising from pure Gaussian noise as in video generation, GNVC-VD initializes refinement from decoded spatio-temporal latents and learns a correction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Data Compression Techniques · Advanced Image Processing Techniques
