Jointly Trained Image and Video Generation using Residual Vectors
Yatin Dandi, Aniket Das, Soumye Singhal, Vinay P. Namboodiri, Piyush, Rai

TL;DR
This paper introduces a joint training method for image and video generation models using residual vectors to encode temporal changes, improving sample quality and diversity.
Contribution
It presents a novel approach that jointly trains image and video generators with residual vectors, enabling better temporal modeling and shared information across frames.
Findings
Enhanced sample quality and diversity in generated images and videos
Compatibility with pre-training on mixed datasets of images and videos
Effective exploitation of feature similarity across frames
Abstract
In this work, we propose a modeling technique for jointly training image and video generation models by simultaneously learning to map latent variables with a fixed prior onto real images and interpolate over images to generate videos. The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire video. We utilize the technique to jointly train an image generation model with a fixed prior along with a video generation model lacking constraints such as disentanglement. The joint training enables the image generator to exploit temporal information while the video generation model learns to flexibly share information across frames. Moreover, experimental results verify our approach's compatibility with pre-training on videos or images and training on datasets containing a mixture of both. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
