Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation
Jingyi Xu, Hieu Le, Zhixin Shu, Yang Wang, Yi-Hsuan Tsai, and Dimitris Samaras

TL;DR
This paper introduces a novel framework for generating emotionally expressive talking-head videos by modeling continuous emotion intensity fluctuations, capturing subtle dynamic changes during speech for more realistic and expressive outputs.
Contribution
It proposes a continuous emotion latent space and an audio-to-intensity predictor trained with pseudo-labels, enabling precise control and realistic modeling of emotion intensity dynamics.
Findings
Effective in capturing emotion intensity fluctuations
Enhances realism and expressiveness of generated talking-heads
Validated through extensive experiments
Abstract
Human emotional expression is inherently dynamic, complex, and fluid, characterized by smooth transitions in intensity throughout verbal communication. However, the modeling of such intensity fluctuations has been largely overlooked by previous audio-driven talking-head generation methods, which often results in static emotional outputs. In this paper, we explore how emotion intensity fluctuates during speech, proposing a method for capturing and generating these subtle shifts for talking-head generation. Specifically, we develop a talking-head framework that is capable of generating a variety of emotions with precise control over intensity levels. This is achieved by learning a continuous emotion latent space, where emotion types are encoded within latent orientations and emotion intensity is reflected in latent norms. In addition, to capture the dynamic intensity fluctuations, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and dialogue systems · Music Technology and Sound Studies
