LaMD: Latent Motion Diffusion for Image-Conditional Video Generation
Yaosi Hu, Zhenzhong Chen, Chong Luo

TL;DR
LaMD introduces a latent motion diffusion framework that simplifies video generation by focusing on motion modeling, resulting in high-quality, controllable videos with faster sampling across diverse datasets.
Contribution
The paper presents a novel latent motion diffusion approach combining a motion autoencoder and diffusion generator to improve efficiency and expressiveness in video generation.
Findings
High-quality video generation on multiple benchmarks
Significant reduction in sampling time
Effective modeling of diverse and controllable motions
Abstract
The video generation field has witnessed rapid improvements with the introduction of recent diffusion models. While these models have successfully enhanced appearance quality, they still face challenges in generating coherent and natural movements while efficiently sampling videos. In this paper, we propose to condense video generation into a problem of motion generation, to improve the expressiveness of motion and make video generation more manageable. This can be achieved by breaking down the video generation process into latent motion generation and video reconstruction. Specifically, we present a latent motion diffusion (LaMD) framework, which consists of a motion-decomposed video autoencoder and a diffusion-based motion generator, to implement this idea. Through careful design, the motion-decomposed video autoencoder can compress patterns in movement into a concise latent motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Human Pose and Action Recognition
MethodsDiffusion
