Markov Decision Process for Video Generation
Vladyslav Yushchenko, Nikita Araslanov, Stefan Roth

TL;DR
This paper introduces a Markov Decision Process framework for video generation that addresses temporal inconsistencies, enhances long-term modeling, and improves video quality using new metrics and integration with existing models.
Contribution
The paper reformulates video generation as an MDP to enable long-term modeling and introduces new metrics for better temporal diversity assessment.
Findings
Improved video quality on Human Actions and UCF-101 datasets.
More memory-efficient model with better temporal consistency.
Effective integration with existing frameworks like MoCoGAN.
Abstract
We identify two pathological cases of temporal inconsistencies in video generation: video freezing and video looping. To better quantify the temporal diversity, we propose a class of complementary metrics that are effective, easy to implement, data agnostic, and interpretable. Further, we observe that current state-of-the-art models are trained on video samples of fixed length thereby inhibiting long-term modeling. To address this, we reformulate the problem of video generation as a Markov Decision Process (MDP). The underlying idea is to represent motion as a stochastic process with an infinite forecast horizon to overcome the fixed length limitation and to mitigate the presence of temporal artifacts. We show that our formulation is easy to integrate into the state-of-the-art MoCoGAN framework. Our experiments on the Human Actions and UCF-101 datasets demonstrate that our MDP-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
