MAUGen: A Unified Diffusion Approach for Multi-Identity Facial Expression and AU Label Generation

Xiangdong Li; Ye Lou; Ao Gao; Wei Zhang; Siyang Song

arXiv:2602.00583·cs.CV·February 3, 2026

MAUGen: A Unified Diffusion Approach for Multi-Identity Facial Expression and AU Label Generation

Xiangdong Li, Ye Lou, Ao Gao, Wei Zhang, Siyang Song

PDF

Open Access 1 Video

TL;DR

MAUGen is a diffusion-based framework that jointly generates diverse, photorealistic facial images and detailed AU labels from text prompts, addressing data scarcity in AU recognition.

Contribution

It introduces a multi-modal diffusion approach with a new dataset, enabling realistic face and AU label synthesis conditioned on text.

Findings

01

Outperforms existing methods in image and AU label synthesis

02

Creates a large-scale, diverse synthetic facial dataset with annotations

03

Demonstrates improved AU recognition performance

Abstract

The lack of large-scale, demographically diverse face images with precise Action Unit (AU) occurrence and intensity annotations has long been recognized as a fundamental bottleneck in developing generalizable AU recognition systems. In this paper, we propose MAUGen, a diffusion-based multi-modal framework that jointly generates a large collection of photorealistic facial expressions and anatomically consistent AU labels, including both occurrence and intensity, conditioned on a single descriptive text prompt. Our MAUGen involves two key modules: (1) a Multi-modal Representation Learning (MRL) module that captures the relationships among the paired textual description, facial identity, expression image, and AU activations within a unified latent space; and (2) a Diffusion-based Image label Generator (DIG) that decodes the joint representation into aligned facial image-label pairs across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MAUGen: A Unified Diffusion Approach for Multi-Identity Facial Expression and AU Label Generation· underline

Taxonomy

TopicsFace recognition and analysis · Emotion and Mood Recognition · Face Recognition and Perception