UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation
Xiang Wang, Shiwei Zhang, Changxin Gao, Jiayu Wang, Xiaoqiang Zhou,, Yingya Zhang, Luxin Yan, Nong Sang

TL;DR
UniAnimate introduces a unified diffusion framework for efficient, long-term human video animation, reducing complexity and enhancing temporal coherence, capable of generating minute-long videos with high consistency.
Contribution
The paper proposes a unified diffusion model and a novel temporal architecture to improve efficiency, coherence, and length of human video animation.
Findings
Achieves superior results compared to state-of-the-art methods.
Can generate highly consistent one-minute videos.
Reduces model complexity and optimization burden.
Abstract
Recent diffusion-based human image animation techniques have demonstrated impressive success in synthesizing videos that faithfully follow a given reference identity and a sequence of desired movement poses. Despite this, there are still two limitations: i) an extra reference model is required to align the identity image with the main video branch, which significantly increases the optimization burden and model parameters; ii) the generated video is usually short in time (e.g., 24 frames), hampering practical applications. To address these shortcomings, we present a UniAnimate framework to enable efficient and long-term human video generation. First, to reduce the optimization difficulty and ensure temporal coherence, we map the reference image along with the posture guidance and noise video into a common feature space by incorporating a unified video diffusion model. Second, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Computer Graphics and Visualization Techniques
MethodsAttention Is All You Need · Softmax · ALIGN · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Diffusion · Adam · Residual Connection
