Your Image is Secretly the Last Frame of a Pseudo Video
Wenlong Chen, Wenlin Chen, Lapo Rastrelli, Yingzhen Li

TL;DR
This paper proposes enhancing image generative models by leveraging pseudo videos created through data augmentation, which provides additional self-supervision and improves image quality, as demonstrated on CIFAR10 and CelebA datasets.
Contribution
It introduces a method to extend image models to video models using pseudo videos, and analyzes data augmentation strategies to improve generative performance.
Findings
Improved image quality with pseudo videos on CIFAR10.
Enhanced CelebA image generation results.
More expressive data augmentation benefits model training.
Abstract
Diffusion models, which can be viewed as a special case of hierarchical variational autoencoders (HVAEs), have shown profound success in generating photo-realistic images. In contrast, standard HVAEs often produce images of inferior quality compared to diffusion models. In this paper, we hypothesize that the success of diffusion models can be partly attributed to the additional self-supervision information for their intermediate latent states provided by corrupted images, which along with the original image form a pseudo video. Based on this hypothesis, we explore the possibility of improving other types of generative models with such pseudo videos. Specifically, we first extend a given image generative model to their video generative model counterpart, and then train the video generative model on pseudo videos constructed by applying data augmentation to the original images.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Embodied and Extended Cognition · Face Recognition and Perception
MethodsDiffusion
