Generating time-consistent dynamics with discriminator-guided image diffusion models
Philipp Hess, Maximilian Gelbrecht, Christof Sch\"otz, Michael Aich, Yu Huang, Shangshang Yang, Niklas Boers

TL;DR
This paper introduces a discriminator-guided approach that enables pretrained image diffusion models to generate realistic, time-consistent spatiotemporal dynamics in videos, reducing the need for extensive training.
Contribution
It proposes a time-consistency discriminator that guides pretrained image diffusion models for dynamic video generation without additional training or fine-tuning.
Findings
Performs comparably to trained VDM in temporal consistency
Improves uncertainty calibration and reduces biases
Enables stable long-term climate simulations
Abstract
Realistic temporal dynamics are crucial for many video generation, processing and modelling applications, e.g. in computational fluid dynamics, weather prediction, or long-term climate simulations. Video diffusion models (VDMs) are the current state-of-the-art method for generating highly realistic dynamics. However, training VDMs from scratch can be challenging and requires large computational resources, limiting their wider application. Here, we propose a time-consistency discriminator that enables pretrained image diffusion models to generate realistic spatiotemporal dynamics. The discriminator guides the sampling inference process and does not require extensions or finetuning of the image diffusion model. We compare our approach against a VDM trained from scratch on an idealized turbulence simulation and a real-world global precipitation dataset. Our approach performs equally well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
