X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio

Chenxu Zhang; Zenan Li; Hongyi Xu; You Xie; Xiaochen Zhao; Tianpei Gu; Guoxian Song; Xin Chen; Chao Liang; Jianwen Jiang; Linjie Luo

arXiv:2508.02944·cs.CV·August 6, 2025

X-Actor: Emotional and Expressive Long-Range Portrait Acting from Audio

Chenxu Zhang, Zenan Li, Hongyi Xu, You Xie, Xiaochen Zhao, Tianpei Gu, Guoxian Song, Xin Chen, Chao Liang, Jianwen Jiang, Linjie Luo

PDF

TL;DR

X-Actor is a novel framework that generates emotionally expressive, long-form talking head videos from a single image and audio, capturing nuanced emotions with high fidelity and coherence over extended durations.

Contribution

It introduces a two-stage diffusion-based pipeline that models long-range facial motion from audio, enabling actor-quality, emotionally rich portrait animations from a single reference image.

Findings

01

Achieves state-of-the-art results in long-range, audio-driven portrait acting.

02

Produces cinematic-style performances with nuanced emotional expression.

03

Operates effectively without error accumulation over extended sequences.

Abstract

We present X-Actor, a novel audio-driven portrait animation framework that generates lifelike, emotionally expressive talking head videos from a single reference image and an input audio clip. Unlike prior methods that emphasize lip synchronization and short-range visual fidelity in constrained speaking scenarios, X-Actor enables actor-quality, long-form portrait performance capturing nuanced, dynamically evolving emotions that flow coherently with the rhythm and content of speech. Central to our approach is a two-stage decoupled generation pipeline: an audio-conditioned autoregressive diffusion model that predicts expressive yet identity-agnostic facial motion latent tokens within a long temporal context window, followed by a diffusion-based video synthesis module that translates these motions into high-fidelity video animations. By operating in a compact facial motion latent space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.