Conditional Image-to-Video Generation with Latent Flow Diffusion Models

Haomiao Ni; Changhao Shi; Kai Li; Sharon X. Huang; Martin Renqiang Min

arXiv:2303.13744·cs.CV·March 27, 2023·1 cites

Conditional Image-to-Video Generation with Latent Flow Diffusion Models

Haomiao Ni, Changhao Shi, Kai Li, Sharon X. Huang, Martin Renqiang Min

PDF

Open Access 1 Repo

TL;DR

This paper introduces latent flow diffusion models (LFDM) for conditional image-to-video generation, effectively synthesizing realistic spatial details and temporal dynamics by warping images in latent space based on generated optical flow sequences.

Contribution

The paper proposes a novel LFDM approach with a two-stage training process, improving efficiency and quality in conditional image-to-video synthesis compared to prior methods.

Findings

01

LFDM outperforms previous methods on multiple datasets.

02

LFDM achieves better spatial detail and temporal coherence.

03

LFDM can be adapted to new domains via simple fine-tuning.

Abstract

Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., a person's face) and a condition (e.g., an action class label like smile). The key challenge of the cI2V task lies in the simultaneous generation of realistic spatial appearance and temporal dynamics corresponding to the given image and condition. In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image. Compared to previous direct-synthesis-based works, our proposed LFDM can better synthesize spatial details and temporal motion by fully utilizing the spatial content of the given image and warping it in the latent space according to the generated temporally-coherent flow. The training of LFDM consists of two separate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nihaomiao/cvpr23_lfdm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging

MethodsDiffusion