Efficient Video Diffusion with Sparse Information Transmission for Video Compression
Mingde Zhou, Zheng Chen, Yulun Zhang

TL;DR
This paper introduces Diff-SIT, a novel video compression method that uses sparse encoding and diffusion models to achieve high perceptual quality and temporal consistency at ultra-low bitrates.
Contribution
The paper proposes a new framework combining sparse temporal encoding and diffusion-based reconstruction to improve ultra-low bitrate video compression.
Findings
Achieves state-of-the-art perceptual quality at ultra-low bitrates.
Maintains strong temporal consistency across frames.
Reduces bitrate significantly compared to existing methods.
Abstract
Video compression aims to maximize reconstruction quality with minimal bitrates. Beyond standard distortion metrics, perceptual quality and temporal consistency are also critical. However, at ultra-low bitrates, traditional end-to-end compression models tend to produce blurry images of poor perceptual quality. Besides, existing generative compression methods often treat video frames independently and show limitations in time coherence and efficiency. To address these challenges, we propose the Efficient Video Diffusion with Sparse Information Transmission (Diff-SIT), which comprises the Sparse Temporal Encoding Module (STEM) and the One-Step Video Diffusion with Frame Type Embedder (ODFTE). The STEM sparsely encodes the original frame sequence into an information-rich intermediate sequence, achieving significant bitrate savings. Subsequently, the ODFTE processes this intermediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Video Coding and Compression Technologies · Advanced Data Compression Techniques
