SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization

Farzaneh Jafari; Stefano Berretti; Anup Basu

arXiv:2604.13335·cs.CV·April 16, 2026

SEDTalker: Emotion-Aware 3D Facial Animation Using Frame-Level Speech Emotion Diarization

Farzaneh Jafari, Stefano Berretti, Anup Basu

PDF

TL;DR

SEDTalker is a novel framework that uses frame-level speech emotion diarization to enable fine-grained, continuous emotional control in 3D facial animation driven by speech.

Contribution

It introduces a method that predicts dense emotion categories and intensities directly from speech, improving expressiveness and temporal coherence in 3D facial animation.

Findings

01

Strong frame-level emotion recognition performance

02

Low geometric and temporal reconstruction errors

03

Smooth emotion transitions in generated animations

Abstract

We introduce SEDTalker, an emotion-aware framework for speech-driven 3D facial animation that leverages frame-level speech emotion diarization to achieve fine-grained expressive control. Unlike prior approaches that rely on utterance-level or manually specified emotion labels, our method predicts temporally dense emotion categories and intensities directly from speech, enabling continuous modulation of facial expressions over time. The diarized emotion signals are encoded as learned embeddings and used to condition a speech-driven 3D animation model based on a hybrid Transformer-Mamba architecture. This design allows effective disentanglement of linguistic content and emotional style while preserving identity and temporal coherence. We evaluate our approach on a large-scale multi-corpus dataset for speech emotion diarization and on the EmoVOCA dataset for emotional 3D facial animation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.