Unveiling Synthetic Faces: How Synthetic Datasets Can Expose Real   Identities

Hatef Otroshi Shahreza; S\'ebastien Marcel

arXiv:2410.24015·cs.CV·November 1, 2024

Unveiling Synthetic Faces: How Synthetic Datasets Can Expose Real Identities

Hatef Otroshi Shahreza, S\'ebastien Marcel

PDF

Open Access

TL;DR

This paper reveals that existing synthetic face datasets leak information from real training data, exposing privacy risks and highlighting the need for more responsible data generation methods.

Contribution

It introduces the first systematic membership inference attack on synthetic face datasets, demonstrating privacy leaks from training data.

Findings

01

All six studied datasets leak real data information.

02

Synthetic datasets can expose privacy vulnerabilities.

03

The study highlights the need for privacy-aware synthetic data generation.

Abstract

Synthetic data generation is gaining increasing popularity in different computer vision applications. Existing state-of-the-art face recognition models are trained using large-scale face datasets, which are crawled from the Internet and raise privacy and ethical concerns. To address such concerns, several works have proposed generating synthetic face datasets to train face recognition models. However, these methods depend on generative models, which are trained on real face images. In this work, we design a simple yet effective membership inference attack to systematically study if any of the existing synthetic face recognition datasets leak any information from the real data used to train the generator model. We provide an extensive study on 6 state-of-the-art synthetic face recognition datasets, and show that in all these synthetic datasets, several samples from the original real…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis