MAUGen: A Unified Diffusion Approach for Multi-Identity Facial Expression and AU Label Generation
Xiangdong Li, Ye Lou, Ao Gao, Wei Zhang, Siyang Song

TL;DR
MAUGen is a diffusion-based framework that jointly generates diverse, photorealistic facial images and detailed AU labels from text prompts, addressing data scarcity in AU recognition.
Contribution
It introduces a multi-modal diffusion approach with a new dataset, enabling realistic face and AU label synthesis conditioned on text.
Findings
Outperforms existing methods in image and AU label synthesis
Creates a large-scale, diverse synthetic facial dataset with annotations
Demonstrates improved AU recognition performance
Abstract
The lack of large-scale, demographically diverse face images with precise Action Unit (AU) occurrence and intensity annotations has long been recognized as a fundamental bottleneck in developing generalizable AU recognition systems. In this paper, we propose MAUGen, a diffusion-based multi-modal framework that jointly generates a large collection of photorealistic facial expressions and anatomically consistent AU labels, including both occurrence and intensity, conditioned on a single descriptive text prompt. Our MAUGen involves two key modules: (1) a Multi-modal Representation Learning (MRL) module that captures the relationships among the paired textual description, facial identity, expression image, and AU activations within a unified latent space; and (2) a Diffusion-based Image label Generator (DIG) that decodes the joint representation into aligned facial image-label pairs across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace recognition and analysis · Emotion and Mood Recognition · Face Recognition and Perception
