Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark

Yaron Aloni; Rotem Shalev-Arkushin; Yonatan Shafir; Guy Tevet; Ohad Fried; Amit Haim Bermano

arXiv:2508.12438·cs.GR·August 19, 2025

Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation Benchmark

Yaron Aloni, Rotem Shalev-Arkushin, Yonatan Shafir, Guy Tevet, Ohad Fried, Amit Haim Bermano

PDF

Open Access

TL;DR

Express4D introduces a new, easily collectible dataset with nuanced 4D facial motion sequences and semantic annotations, enabling improved text-driven facial expression generation for applications like animation and virtual avatars.

Contribution

The paper presents a novel dataset of expressive 4D facial motions with semantic labels, collected using commodity equipment and LLM-generated instructions, facilitating fine-grained control and benchmarking.

Findings

01

Baseline models demonstrate effective text-to-expression learning.

02

The dataset captures many-to-many modality mappings.

03

Models show promising results in nuanced facial motion generation.

Abstract

Dynamic facial expression generation from natural language is a crucial task in Computer Graphics, with applications in Animation, Virtual Avatars, and Human-Computer Interaction. However, current generative models suffer from datasets that are either speech-driven or limited to coarse emotion labels, lacking the nuanced, expressive descriptions needed for fine-grained control, and were captured using elaborate and expensive equipment. We hence present a new dataset of facial motion sequences featuring nuanced performances and semantic annotation. The data is easily collected using commodity equipment and LLM-generated natural language instructions, in the popular ARKit blendshape format. This provides riggable motion, rich with expressive performances and labels. We accordingly train two baseline models, and evaluate their performance for future benchmarking. Using our Express4D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis