Conditional Image-to-Video Generation with Latent Flow Diffusion Models
Haomiao Ni, Changhao Shi, Kai Li, Sharon X. Huang, Martin Renqiang Min

TL;DR
This paper introduces latent flow diffusion models (LFDM) for conditional image-to-video generation, effectively synthesizing realistic spatial details and temporal dynamics by warping images in latent space based on generated optical flow sequences.
Contribution
The paper proposes a novel LFDM approach with a two-stage training process, improving efficiency and quality in conditional image-to-video synthesis compared to prior methods.
Findings
LFDM outperforms previous methods on multiple datasets.
LFDM achieves better spatial detail and temporal coherence.
LFDM can be adapted to new domains via simple fine-tuning.
Abstract
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person's face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the simultaneous generation of realistic spatial appearance and temporal dynamics corresponding to the given image and condition. In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image. Compared to previous direct-synthesis-based works, our proposed LFDM can better synthesize spatial details and temporal motion by fully utilizing the spatial content of the given image and warping it in the latent space according to the generated temporally-coherent flow. The training of LFDM consists of two separate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
MethodsDiffusion
