EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing

Diqiong Jiang; Kai Zhu; Dan Song; Jian Chang; Chenglizhao Chen; Zhenyu Wu

arXiv:2601.10000·cs.MM·January 16, 2026

EditEmoTalk: Controllable Speech-Driven 3D Facial Animation with Continuous Expression Editing

Diqiong Jiang, Kai Zhu, Dan Song, Jian Chang, Chenglizhao Chen, Zhenyu Wu

PDF

Open Access

TL;DR

EditEmoTalk is a novel framework for speech-driven 3D facial animation that allows for continuous and fine-grained emotional control, improving expressiveness and realism in facial motion synthesis.

Contribution

It introduces a boundary-aware semantic embedding and an emotional consistency loss for smooth, controllable, and faithful emotional expression in speech-driven facial animation.

Findings

01

Achieves superior controllability and expressiveness

02

Maintains accurate lip synchronization

03

Demonstrates strong generalization in experiments

Abstract

Speech-driven 3D facial animation aims to generate realistic and expressive facial motions directly from audio. While recent methods achieve high-quality lip synchronization, they often rely on discrete emotion categories, limiting continuous and fine-grained emotional control. We present EditEmoTalk, a controllable speech-driven 3D facial animation framework with continuous emotion editing. The key idea is a boundary-aware semantic embedding that learns the normal directions of inter-emotion decision boundaries, enabling a continuous expression manifold for smooth emotion manipulation. Moreover, we introduce an emotional consistency loss that enforces semantic alignment between the generated motion dynamics and the target emotion embedding through a mapping network, ensuring faithful emotional expression. Extensive experiments demonstrate that EditEmoTalk achieves superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis