StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN
Gereon Fox, Ayush Tewari, Mohamed Elgharib, Christian, Theobalt

TL;DR
StyleVideoGAN introduces a novel video synthesis method that leverages a pretrained StyleGAN for high-quality frame generation and a separate temporal model trained on latent codes, drastically reducing training data and resources needed.
Contribution
The paper presents a new approach combining pretrained StyleGAN with a temporal model trained on latent codes, enabling efficient and high-quality video synthesis with minimal training data.
Findings
Achieves high-quality video generation with only 10 minutes of training data.
Can generate videos of new subjects not seen during training.
Reduces training time to approximately 6 hours.
Abstract
Generative adversarial models (GANs) continue to produce advances in terms of the visual quality of still images, as well as the learning of temporal correlations. However, few works manage to combine these two interesting capabilities for the synthesis of video content: Most methods require an extensive training dataset to learn temporal correlations, while being rather limited in the resolution and visual quality of their output. We present a novel approach to the video synthesis problem that helps to greatly improve visual quality and drastically reduce the amount of training data and resources necessary for generating videos. Our formulation separates the spatial domain, in which individual frames are synthesized, from the temporal domain, in which motion is generated. For the spatial domain we use a pre-trained StyleGAN network, the latent space of which allows control over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Digital Media Forensic Detection
MethodsDense Connections · Feedforward Network · R1 Regularization · Convolution · Adaptive Instance Normalization · HuMan(Expedia)||How do I get a human at Expedia?
