Empirical Privacy Evaluations of Generative and Predictive Machine Learning Models -- A review and challenges for practice
Flavio Hafner, Chang Sun

TL;DR
This paper reviews empirical methods for evaluating privacy risks in generative and predictive machine learning models, highlighting practical challenges, limitations of current approaches, and proposing directions for future research.
Contribution
It provides a comprehensive overview of empirical privacy evaluation techniques, discusses their limitations in large-scale scenarios, and suggests future research directions for more realistic threat models.
Findings
Methods verifying training algorithm correctness are effective for large datasets.
Current evaluation methods often assume unrealistic adversaries.
There is a trade-off between evaluation feasibility and threat model realism.
Abstract
Synthetic data generators, when trained using privacy-preserving techniques like differential privacy, promise to produce synthetic data with formal privacy guarantees, facilitating the sharing of sensitive data. However, it is crucial to empirically assess the privacy risks associated with the generated synthetic data before deploying generative technologies. This paper outlines the key concepts and assumptions underlying empirical privacy evaluation in machine learning-based generative and predictive models. Then, this paper explores the practical challenges for privacy evaluations of generative models for use cases with millions of training records, such as data from statistical agencies and healthcare providers. Our findings indicate that methods designed to verify the correct operation of the training algorithm are effective for large datasets, but they often assume an adversary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
