The Impact of Balancing Real and Synthetic Data on Accuracy and Fairness in Face Recognition
Andrea Atzori, Pietro Cosseddu, Gianni Fenu, Mirko Marras

TL;DR
This study examines how combining demographically balanced authentic and synthetic data affects face recognition accuracy and fairness, highlighting diffusion-based synthetic data's benefits for accuracy but limited impact on fairness.
Contribution
It demonstrates the effectiveness of diffusion-based synthetic data in improving face recognition accuracy and evaluates their limited influence on fairness when combined with authentic data.
Findings
Diffusion-based synthetic data improves recognition accuracy.
Balanced data has minimal impact on fairness.
Combining synthetic and authentic data can enhance model performance.
Abstract
Over the recent years, the advancements in deep face recognition have fueled an increasing demand for large and diverse datasets. Nevertheless, the authentic data acquired to create those datasets is typically sourced from the web, which, in many cases, can lead to significant privacy issues due to the lack of explicit user consent. Furthermore, obtaining a demographically balanced, large dataset is even more difficult because of the natural imbalance in the distribution of images from different demographic groups. In this paper, we investigate the impact of demographically balanced authentic and synthetic data, both individually and in combination, on the accuracy and fairness of face recognition models. Initially, several generative methods were used to balance the demographic representations of the corresponding synthetic datasets. Then a state-of-the-art face encoder was trained and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
