MMHead: Towards Fine-grained Multi-modal 3D Facial Animation

Sijing Wu; Yunhao Li; Yichao Yan; Huiyu Duan; Ziwei Liu; Guangtao Zhai

arXiv:2410.07757·cs.CV·October 11, 2024

MMHead: Towards Fine-grained Multi-modal 3D Facial Animation

Sijing Wu, Yunhao Li, Yichao Yan, Huiyu Duan, Ziwei Liu, Guangtao Zhai

PDF

Open Access

TL;DR

This paper introduces MMHead, a large-scale multi-modal 3D facial animation dataset with hierarchical text annotations, and proposes a VQ-VAE-based method, MM2Face, for text-driven 3D facial motion generation, advancing the field significantly.

Contribution

The paper constructs the first large-scale multi-modal 3D facial animation dataset with hierarchical annotations and establishes benchmarks for text-driven 3D facial animation tasks.

Findings

01

MMHead dataset contains 49 hours of data with rich annotations.

02

MM2Face achieves competitive results on new benchmarks.

03

The dataset and benchmarks promote progress in multi-modal 3D facial animation.

Abstract

3D facial animation has attracted considerable attention due to its extensive applications in the multimedia field. Audio-driven 3D facial animation has been widely explored with promising results. However, multi-modal 3D facial animation, especially text-guided 3D facial animation is rarely explored due to the lack of multi-modal 3D facial animation dataset. To fill this gap, we first construct a large-scale multi-modal 3D facial animation dataset, MMHead, which consists of 49 hours of 3D facial motion sequences, speech audios, and rich hierarchical text annotations. Each text annotation contains abstract action and emotion descriptions, fine-grained facial and head movements (i.e., expression and head pose) descriptions, and three possible scenarios that may cause such emotion. Concretely, we integrate five public 2D portrait video datasets, and propose an automatic pipeline to 1)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Human Motion and Animation

MethodsSoftmax · Attention Is All You Need