Flexible and scalable privacy assessment for very large datasets, with an application to official governmental microdata
M\'ario S. Alvim, Natasha Fernandes, Annabelle McIver, Carroll Morgan,, Gabriel H. Nunes

TL;DR
This paper introduces a flexible, scalable, and explainable privacy assessment method based on Quantitative Information Flow, demonstrated on Brazil's extensive educational census data to evaluate privacy risks over time.
Contribution
It presents a novel privacy analysis framework that is adaptable, computationally feasible for large datasets, and understandable to non-experts, applied to real-world government data.
Findings
Effective privacy quantification for large datasets
Method is computationally tractable for datasets with millions of records
Results are accessible to policymakers and the public
Abstract
We present a systematic refactoring of the conventional treatment of privacy analyses, basing it on mathematical concepts from the framework of Quantitative Information Flow (QIF). The approach we suggest brings three principal advantages: it is flexible, allowing for precise quantification and comparison of privacy risks for attacks both known and novel; it can be computationally tractable for very large, longitudinal datasets; and its results are explainable both to politicians and to the general public. We apply our approach to a very large case study: the Educational Censuses of Brazil, curated by the governmental agency INEP, which comprise over 90 attributes of approximately 50 million individuals released longitudinally every year since 2007. These datasets have only very recently (2018-2021) attracted legislation to regulate their privacy -- while at the same time continuing to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Privacy-Preserving Technologies in Data · Data Quality and Management
