Controllable Longer Image Animation with Diffusion Models

Qiang Wang; Minghua Liu; Junjun Hu; Fan Jiang; Mu Xu

arXiv:2405.17306·cs.CV·May 29, 2024

Controllable Longer Image Animation with Diffusion Models

Qiang Wang, Minghua Liu, Junjun Hu, Fan Jiang, Mu Xu

PDF

Open Access

TL;DR

This paper presents a novel diffusion-based method for controllable, long-duration image animation that allows precise motion control and maintains content consistency over 100 frames, surpassing existing short-video limitations.

Contribution

It introduces a new long-duration video generation approach with noise rescheduling, enabling controllable and coherent animations from static images.

Findings

01

Outperforms 10 baseline methods in experiments

02

Enables creation of videos over 100 frames with content consistency

03

Provides precise control over motion direction and speed

Abstract

Generating realistic animated videos from static images is an important area of research in computer vision. Methods based on physical simulation and motion prediction have achieved notable advances, but they are often limited to specific object textures and motion trajectories, failing to exhibit highly complex environments and physical dynamics. In this paper, we introduce an open-domain controllable image animation method using motion priors with video diffusion models. Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos and learning moving trajectories and strengths. Current pretrained video generation models are typically limited to producing very short videos, typically less than 30 frames. In contrast, we propose an efficient long-duration video generation method based on noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion