AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation

Chao Liang; Jianwen Jiang; Wang Liao; Jiaqi Yang; Zerong zheng; Weihong Zeng; Han Liang

arXiv:2506.11144·cs.CV·June 16, 2025

AlignHuman: Improving Motion and Fidelity via Timestep-Segment Preference Optimization for Audio-Driven Human Animation

Chao Liang, Jianwen Jiang, Wang Liao, Jiaqi Yang, Zerong zheng, Weihong Zeng, Han Liang

PDF

Open Access

TL;DR

AlignHuman introduces a novel framework that optimizes human animation by segmenting timesteps and using preference-guided training, significantly improving motion naturalness and fidelity while reducing inference steps.

Contribution

It proposes timestep-segment preference optimization with specialized LoRAs, enabling joint enhancement of motion and fidelity in diffusion-based human animation.

Findings

01

Achieves 3.3× speedup with minimal quality loss

02

Improves baseline performance in human animation tasks

03

Reduces number of inference steps from 100 to 30 NFEs

Abstract

Recent advancements in human video generation and animation tasks, driven by diffusion models, have achieved significant progress. However, expressive and realistic human animation remains challenging due to the trade-off between motion naturalness and visual fidelity. To address this, we propose \textbf{AlignHuman}, a framework that combines Preference Optimization as a post-training technique with a divide-and-conquer training strategy to jointly optimize these competing objectives. Our key insight stems from an analysis of the denoising process across timesteps: (1) early denoising timesteps primarily control motion dynamics, while (2) fidelity and human structure can be effectively managed by later timesteps, even if early steps are skipped. Building on this observation, we propose timestep-segment preference optimization (TPO) and introduce two specialized LoRAs as expert alignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Music Technology and Sound Studies · Video Analysis and Summarization