FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation

MengChao Wang; Qiang Wang; Fan Jiang; Mu Xu

arXiv:2508.11255·cs.CV·August 18, 2025

FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation

MengChao Wang, Qiang Wang, Fan Jiang, Mu Xu

PDF

TL;DR

FantasyTalking2 introduces a novel framework that optimizes multidimensional human preferences in audio-driven portrait animation, improving naturalness, lip-sync, and visual quality through a specialized reward model and adaptive preference optimization.

Contribution

The paper presents Talking-Critic, a large-scale preference dataset, and TLPO, a new adaptive optimization framework for fine-grained, multidimensional portrait animation.

Findings

01

Talking-Critic outperforms existing reward models in preference alignment.

02

TLPO significantly improves lip-sync accuracy, motion naturalness, and visual quality.

03

Experimental results show superior qualitative and quantitative performance.

Abstract

Recent advances in audio-driven portrait animation have demonstrated impressive capabilities. However, existing methods struggle to align with fine-grained human preferences across multiple dimensions, such as motion naturalness, lip-sync accuracy, and visual quality. This is due to the difficulty of optimizing among competing preference objectives, which often conflict with one another, and the scarcity of large-scale, high-quality datasets with multidimensional preference annotations. To address these, we first introduce Talking-Critic, a multimodal reward model that learns human-aligned reward functions to quantify how well generated videos satisfy multidimensional expectations. Leveraging this model, we curate Talking-NSQ, a large-scale multidimensional human preference dataset containing 410K preference pairs. Finally, we propose Timestep-Layer adaptive multi-expert Preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.