A Causal Diffusion Model for Video Reconstruction from Ultra-Low-Bitrate Representations
Cem Eteke, Batuhan Tosun, Martin Piccolrovazzi, Alexander Griessel, Wolfgang Kellerer, Eckehard Steinbach

TL;DR
This paper introduces a causal video diffusion model that effectively reconstructs videos from ultra-low-bitrate data, outperforming existing methods in fidelity, temporal consistency, and perceptual quality.
Contribution
It presents a novel causal diffusion approach that jointly models semantics and compressed frames, with a temporal distillation technique for efficient training and inference.
Findings
Outperforms classical, neural, generative, and semantic baselines in ultra-low-bitrate video reconstruction.
Enables parameter-efficient training and causal few-step inference.
Achieves superior qualitative and subjective reconstruction quality.
Abstract
We study video reconstruction from ultra-low-bitrate representations, where the primary challenge shifts from encoding to decoding. In this regime, reconstruction with classical and neural codecs introduces blur, while generative and semantic approaches often struggle to jointly preserve fidelity, temporal consistency, and perceptual quality. To address these limitations, we propose a causal video diffusion model that reconstructs videos from ultra-low-bitrate semantics and highly compressed frames by jointly modeling their complementary information. We further introduce temporal-only distillation from a bidirectional teacher to enable parameter-efficient training and causal few-step inference. Through extensive quantitative, qualitative, and subjective evaluation, we show that the proposed method outperforms classical, neural, generative, and semantic baselines in ultra-low-bitrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
