UniAnimate: Taming Unified Video Diffusion Models for Consistent Human   Image Animation

Xiang Wang; Shiwei Zhang; Changxin Gao; Jiayu Wang; Xiaoqiang Zhou,; Yingya Zhang; Luxin Yan; Nong Sang

arXiv:2406.01188·cs.CV·June 4, 2024·1 cites

UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation

Xiang Wang, Shiwei Zhang, Changxin Gao, Jiayu Wang, Xiaoqiang Zhou,, Yingya Zhang, Luxin Yan, Nong Sang

PDF

Open Access 2 Repos 2 Models

TL;DR

UniAnimate introduces a unified diffusion framework for efficient, long-term human video animation, reducing complexity and enhancing temporal coherence, capable of generating minute-long videos with high consistency.

Contribution

The paper proposes a unified diffusion model and a novel temporal architecture to improve efficiency, coherence, and length of human video animation.

Findings

01

Achieves superior results compared to state-of-the-art methods.

02

Can generate highly consistent one-minute videos.

03

Reduces model complexity and optimization burden.

Abstract

Recent diffusion-based human image animation techniques have demonstrated impressive success in synthesizing videos that faithfully follow a given reference identity and a sequence of desired movement poses. Despite this, there are still two limitations: i) an extra reference model is required to align the identity image with the main video branch, which significantly increases the optimization burden and model parameters; ii) the generated video is usually short in time (e.g., 24 frames), hampering practical applications. To address these shortcomings, we present a UniAnimate framework to enable efficient and long-term human video generation. First, to reduce the optimization difficulty and ensure temporal coherence, we map the reference image along with the posture guidance and noise video into a common feature space by incorporating a unified video diffusion model. Second, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Computer Graphics and Visualization Techniques

MethodsAttention Is All You Need · Softmax · ALIGN · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Diffusion · Adam · Residual Connection