Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space

Kangwei Liu; Junwu Liu; Xiaowei Yi; Jinlin Guo; Yun Cao

arXiv:2506.10007·cs.MM·June 13, 2025

Controllable Expressive 3D Facial Animation via Diffusion in a Unified Multimodal Space

Kangwei Liu, Junwu Liu, Xiaowei Yi, Jinlin Guo, Yun Cao

PDF

Open Access

TL;DR

This paper introduces a diffusion-based framework for controllable expressive 3D facial animation that effectively integrates multiple control signals and enhances motion diversity, resulting in more natural and emotionally expressive animations.

Contribution

It proposes a multimodal emotion binding strategy and an attention-based latent diffusion model to improve controllability and diversity in 3D facial animation.

Findings

01

Outperforms existing methods on most metrics

02

Achieves 21.6% improvement in emotion similarity

03

Maintains natural facial dynamics

Abstract

Audio-driven emotional 3D facial animation encounters two significant challenges: (1) reliance on single-modal control signals (videos, text, or emotion labels) without leveraging their complementary strengths for comprehensive emotion manipulation, and (2) deterministic regression-based mapping that constrains the stochastic nature of emotional expressions and non-verbal behaviors, limiting the expressiveness of synthesized animations. To address these challenges, we present a diffusion-based framework for controllable expressive 3D facial animation. Our approach introduces two key innovations: (1) a FLAME-centered multimodal emotion binding strategy that aligns diverse modalities (text, audio, and emotion labels) through contrastive learning, enabling flexible emotion control from multiple signal sources, and (2) an attention-based latent diffusion model with content-aware attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Emotion and Mood Recognition · Social Robot Interaction and HRI

MethodsSoftmax · Attention Is All You Need · Diffusion · Latent Diffusion Model