TL;DR
This paper introduces a diffusion probabilistic model for video generation that outperforms previous methods in perceptual quality and probabilistic forecasting, using an autoregressive, end-to-end approach inspired by neural video compression.
Contribution
It presents a novel autoregressive diffusion model for video generation that improves perceptual quality and probabilistic forecasting over prior approaches.
Findings
Significant improvements in perceptual quality across datasets.
Outperforms existing models in probabilistic frame forecasting.
Scalable CRPS for video demonstrates better probabilistic predictions.
Abstract
Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in perceptual and probabilistic forecasting metrics. We propose an autoregressive, end-to-end optimized video diffusion model inspired by recent advances in neural video compression. The model successively generates future frames by correcting a deterministic next-frame prediction using a stochastic residual generated by an inverse diffusion process. We compare this approach against five baselines on four datasets involving natural and simulation-based videos. We find significant improvements in terms of perceptual quality for all datasets. Furthermore, by introducing a scalable version of the Continuous Ranked Probability Score (CRPS) applicable to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
