MMHead: Towards Fine-grained Multi-modal 3D Facial Animation
Sijing Wu, Yunhao Li, Yichao Yan, Huiyu Duan, Ziwei Liu, Guangtao Zhai

TL;DR
This paper introduces MMHead, a large-scale multi-modal 3D facial animation dataset with hierarchical text annotations, and proposes a VQ-VAE-based method, MM2Face, for text-driven 3D facial motion generation, advancing the field significantly.
Contribution
The paper constructs the first large-scale multi-modal 3D facial animation dataset with hierarchical annotations and establishes benchmarks for text-driven 3D facial animation tasks.
Findings
MMHead dataset contains 49 hours of data with rich annotations.
MM2Face achieves competitive results on new benchmarks.
The dataset and benchmarks promote progress in multi-modal 3D facial animation.
Abstract
3D facial animation has attracted considerable attention due to its extensive applications in the multimedia field. Audio-driven 3D facial animation has been widely explored with promising results. However, multi-modal 3D facial animation, especially text-guided 3D facial animation is rarely explored due to the lack of multi-modal 3D facial animation dataset. To fill this gap, we first construct a large-scale multi-modal 3D facial animation dataset, MMHead, which consists of 49 hours of 3D facial motion sequences, speech audios, and rich hierarchical text annotations. Each text annotation contains abstract action and emotion descriptions, fine-grained facial and head movements (i.e., expression and head pose) descriptions, and three possible scenarios that may cause such emotion. Concretely, we integrate five public 2D portrait video datasets, and propose an automatic pipeline to 1)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Human Motion and Animation
MethodsSoftmax · Attention Is All You Need
