StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation

Shuyuan Tu; Zhen Xing; Xintong Han; Zhi-Qi Cheng; Qi Dai; Chong Luo; Zuxuan Wu; Yu-Gang Jiang

arXiv:2507.15064·cs.CV·July 22, 2025

StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation

Shuyuan Tu, Zhen Xing, Xintong Han, Zhi-Qi Cheng, Qi Dai, Chong Luo, Zuxuan Wu, Yu-Gang Jiang

PDF

TL;DR

StableAnimator++ is a novel video diffusion framework that preserves identity and improves pose alignment in human image animation by integrating learnable transformations, face encoding, and HJB-based face optimization.

Contribution

It introduces a comprehensive ID-preserving diffusion approach with learnable pose alignment and face optimization, addressing pose misalignment and face distortion issues.

Findings

01

Enhanced identity preservation in generated videos.

02

Effective pose alignment via learnable similarity transformations.

03

Improved facial fidelity through HJB-based face optimization.

Abstract

Current diffusion models for human image animation often struggle to maintain identity (ID) consistency, especially when the reference image and driving video differ significantly in body size or position. We introduce StableAnimator++, the first ID-preserving video diffusion framework with learnable pose alignment, capable of generating high-quality videos conditioned on a reference image and a pose sequence without any post-processing. Building upon a video diffusion model, StableAnimator++ contains carefully designed modules for both training and inference, striving for identity consistency. In particular, StableAnimator++ first uses learnable layers to predict the similarity transformation matrices between the reference image and the driven poses via injecting guidance from Singular Value Decomposition (SVD). These matrices align the driven poses with the reference image, mitigating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.