Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data
Pavel Korshunov, Ketan Kotwal, Christophe Ecabert, Vidit Vidit, Amir Mohammadi, and Sebastien Marcel

TL;DR
This paper evaluates how synthetic data impacts face recognition accuracy and bias, showing that balanced synthetic datasets can help mitigate bias but still lag in generalization compared to real data.
Contribution
It introduces a balanced synthetic face dataset and systematically assesses its effects on face recognition performance and bias mitigation.
Findings
Synthetic data lags behind real data in generalization on benchmarks.
Demographically balanced synthetic datasets can reduce bias.
Augmentation quality and quantity influence accuracy and fairness.
Abstract
Synthetic data has emerged as a promising alternative for training face recognition (FR) models, offering advantages in scalability, privacy compliance, and potential for bias mitigation. However, critical questions remain on whether both high accuracy and fairness can be achieved with synthetic data. In this work, we evaluate the impact of synthetic data on bias and performance of FR systems. We generate balanced face dataset, FairFaceGen, using two state of the art text-to-image generators, Flux.1-dev and Stable Diffusion v3.5 (SD35), and combine them with several identity augmentation methods, including Arc2Face and four IP-Adapters. By maintaining equal identity count across synthetic and real datasets, we ensure fair comparisons when evaluating FR performance on standard (LFW, AgeDB-30, etc.) and challenging IJB-B/C benchmarks and FR bias on Racial Faces in-the-Wild (RFW) dataset.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Biometric Identification and Security · Domain Adaptation and Few-Shot Learning
