Inference-based GAN Video Generation
Jingbo Yang, Adrian G. Bors

TL;DR
This paper introduces a novel memory-efficient VAE-GAN-based framework for generating long, coherent videos with hundreds or thousands of frames, overcoming previous limitations in temporal scaling and quality degradation.
Contribution
It extends existing VAE-GAN models by employing a Markov chain approach with a recall mechanism to produce long, continuous videos with maintained temporal dependencies.
Findings
Successfully generates long videos with hundreds of frames.
Maintains temporal continuity and realistic movement in generated videos.
Outperforms previous models in long sequence generation quality.
Abstract
Video generation has seen remarkable progress thanks to advancements in generative deep learning. However, generating long sequences remains a significant challenge. Generated videos should not only display coherent and continuous movement but also meaningful movement in successions of scenes. Models such as GANs, VAEs, and Diffusion Networks have been used for generating short video sequences, typically up to 16 frames. In this paper, we first propose a new type of video generator by enabling adversarial-based unconditional video generators with a variational encoder, akin to a VAE-GAN hybrid structure. The proposed model, as in other video deep learning-based processing frameworks, incorporates two processing branches, one for content and another for movement. However, existing models struggle with the temporal scaling of the generated videos. Classical approaches often result in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Pose and Action Recognition
