VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
Qilin Wang, Zhengkai Jiang, Chengming Xu, Jiangning Zhang, Yabiao, Wang, Xinyi Zhang, Yun Cao, Weijian Cao, Chengjie Wang, Yanwei Fu

TL;DR
VividPose is an end-to-end video diffusion framework that produces realistic, identity-preserving human animations with high temporal stability, handling diverse poses and shapes effectively.
Contribution
It introduces an innovative end-to-end pipeline with identity-aware appearance and geometry-aware pose controllers for improved human image animation.
Findings
Achieves state-of-the-art performance on UBCFashion and TikTok benchmarks.
Maintains high identity fidelity across diverse poses and shapes.
Demonstrates superior generalization on in-the-wild datasets.
Abstract
Human image animation involves generating a video from a static image by following a specified pose sequence. Current approaches typically adopt a multi-stage pipeline that separately learns appearance and motion, which often leads to appearance degradation and temporal inconsistencies. To address these issues, we propose VividPose, an innovative end-to-end pipeline based on Stable Video Diffusion (SVD) that ensures superior temporal stability. To enhance the retention of human identity, we propose an identity-aware appearance controller that integrates additional facial information without compromising other appearance details such as clothing texture and background. This approach ensures that the generated videos maintain high fidelity to the identity of human subject, preserving key facial features across various poses. To accommodate diverse human body shapes and hand movements, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Human Motion and Animation
MethodsDiffusion
