AUHead: Realistic Emotional Talking Head Generation via Action Units Control

Jiayi Lyu; Leigang Qu; Wenjing Zhang; Hanyu Jiang; Kai Liu; Zhenglin Zhou; Xiaobo Xia; Jian Xue; Tat-Seng Chua

arXiv:2602.09534·cs.CV·May 12, 2026

AUHead: Realistic Emotional Talking Head Generation via Action Units Control

Jiayi Lyu, Leigang Qu, Wenjing Zhang, Hanyu Jiang, Kai Liu, Zhenglin Zhou, Xiaobo Xia, Jian Xue, Tat-Seng Chua

PDF

1 Repo 1 Video

TL;DR

AUHead is a two-stage method that enables fine-grained emotional control in talking-head video generation by disentangling and manipulating Action Units, resulting in more realistic and expressive virtual avatars.

Contribution

The paper introduces a novel two-stage approach combining large audio-language models and a controllable diffusion model for emotion-aware talking-head synthesis.

Findings

01

Achieves superior emotional realism and lip synchronization on benchmark datasets.

02

Effectively disentangles and controls Action Units for nuanced emotional expression.

03

Outperforms existing methods in visual coherence and identity preservation.

Abstract

Realistic talking-head video generation is critical for virtual avatars, film production, and interactive systems. Current methods struggle with nuanced emotional expressions due to the lack of fine-grained emotion control. To address this issue, we introduce a novel two-stage method (AUHead) to disentangle fine-grained emotion control, i.e. , Action Units (AUs), from audio and achieve controllable generation. In the first stage, we explore the AU generation abilities of large audio-language models (ALMs), by spatial-temporal AU tokenization and an "emotion-then-AU" chain-of-thought mechanism. It aims to disentangle AUs from raw speech, effectively capturing subtle emotional cues. In the second stage, we propose an AU-driven controllable diffusion model that synthesizes realistic talking-head videos conditioned on AU sequences. Specifically, we first map the AU sequences into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

laura990501/AUHead_ICLR
github

Videos

AUHead: Realistic Emotional Talking Head Generation via Action Units Control· slideslive