Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description
Mahshid Dehghani, Amirahmad Shafiee, Ali Shafiei, Neda Fallah,, Farahmand Alizadeh, Mohammad Mehdi Gholinejad, Hamid Behroozi, Jafar Habibi,, Ehsaneddin Asgari

TL;DR
Emo3D presents a comprehensive dataset and evaluation metric for 3D facial expression generation from emotion descriptions, enabling better assessment and synthesis of emotional expressions in virtual applications.
Contribution
The paper introduces Emo3D, a large-scale dataset with diverse emotion annotations and a novel evaluation metric for 3D facial expression synthesis from text.
Findings
Emo3D outperforms traditional MSE metrics in evaluating emotion conveyance.
Language and vision-language models can be effectively fine-tuned using the dataset.
The new metric better captures emotional accuracy in 3D facial expressions.
Abstract
Existing 3D facial emotion modeling have been constrained by limited emotion classes and insufficient datasets. This paper introduces "Emo3D", an extensive "Text-Image-Expression dataset" spanning a wide spectrum of human emotions, each paired with images and 3D blendshapes. Leveraging Large Language Models (LLMs), we generate a diverse array of textual descriptions, facilitating the capture of a broad spectrum of emotional expressions. Using this unique dataset, we conduct a comprehensive evaluation of language-based models' fine-tuning and vision-language models like Contranstive Language Image Pretraining (CLIP) for 3D facial expression synthesis. We also introduce a new evaluation metric for this task to more directly measure the conveyed emotion. Our new evaluation metric, Emo3D, demonstrates its superiority over Mean Squared Error (MSE) metrics in assessing visual-text alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEmotion and Mood Recognition
