On Applicability of Synthetic Datasets for Facial Expression Recognition

Ali Azmoudeh; Erdi Sar{\i}ta\c{s}; \"Omer Y{\i}ld{\i}r{\i}m; Haz{\i}m Kemal Ekenel

arXiv:2605.17483·cs.CV·May 19, 2026

On Applicability of Synthetic Datasets for Facial Expression Recognition

Ali Azmoudeh, Erdi Sar{\i}ta\c{s}, \"Omer Y{\i}ld{\i}r{\i}m, Haz{\i}m Kemal Ekenel

PDF

1 Repo

TL;DR

This paper explores synthetic data strategies like pseudo-labeling, diffusion models, and GANs to improve facial expression recognition while addressing data imbalance and privacy concerns.

Contribution

It introduces and evaluates three novel synthetic dataset construction methods for privacy-preserving FER, demonstrating their effectiveness in mitigating dataset limitations.

Findings

01

Synthetic data can effectively substitute or complement real datasets.

02

The proposed strategies improve generalization in facial expression recognition.

03

Cross-dataset evaluations reveal trade-offs and benefits of each synthetic approach.

Abstract

Facial Expression Recognition faces two core challenges. The first is class imbalance in public datasets, which skews the learning process and weakens generalization. The second is related to privacy and data collection constraints, which limit the sharing of facial images and restrict the creation of large, balanced datasets. To address these issues, we examine three complementary strategies for constructing privacy-preserving FER datasets in the standard seven discrete facial expression classes setting. Our strategies are: (i) pseudo-labeling large unlabeled face collections with a teacher model under a confidence-thresholding scheme, (ii) prompt-driven synthesis using diffusion models conditioned on demographic attributes, and (iii) task-aware GAN-based expression editing that modifies facial expression while preserving identity and realism. For training and evaluation, we employed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AliAZ98/SyntFER
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.