LaMD: Latent Motion Diffusion for Image-Conditional Video Generation

Yaosi Hu; Zhenzhong Chen; Chong Luo

arXiv:2304.11603·cs.CV·April 21, 2025·5 cites

LaMD: Latent Motion Diffusion for Image-Conditional Video Generation

Yaosi Hu, Zhenzhong Chen, Chong Luo

PDF

Open Access

TL;DR

LaMD introduces a latent motion diffusion framework that simplifies video generation by focusing on motion modeling, resulting in high-quality, controllable videos with faster sampling across diverse datasets.

Contribution

The paper presents a novel latent motion diffusion approach combining a motion autoencoder and diffusion generator to improve efficiency and expressiveness in video generation.

Findings

01

High-quality video generation on multiple benchmarks

02

Significant reduction in sampling time

03

Effective modeling of diverse and controllable motions

Abstract

The video generation field has witnessed rapid improvements with the introduction of recent diffusion models. While these models have successfully enhanced appearance quality, they still face challenges in generating coherent and natural movements while efficiently sampling videos. In this paper, we propose to condense video generation into a problem of motion generation, to improve the expressiveness of motion and make video generation more manageable. This can be achieved by breaking down the video generation process into latent motion generation and video reconstruction. Specifically, we present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator, to implement this idea. Through careful design, the motion-decomposed video autoencoder can compress patterns in movement into a concise latent motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition

MethodsDiffusion