The appeal of the gamma family distribution to protect the confidentiality of contingency tables
James Jackson, Robin Mitra, Brian Francis, Iain Dove

TL;DR
This paper introduces the gamma family distribution as a new method for generating synthetic contingency table data, balancing privacy and utility better than traditional Poisson-based methods.
Contribution
It proposes using the discretized gamma family distribution for privacy-preserving data synthesis, allowing adaptive noise addition based on cell count size.
Findings
Gamma family distribution provides better privacy-utility balance.
Less noise is applied to larger cell counts.
Method is demonstrated on administrative data similar to ESC.
Abstract
Administrative databases, such as the English School Census (ESC), are rich sources of information that are potentially useful for researchers. For such data sources to be made available, however, strict guarantees of privacy would be required. To achieve this, synthetic data methods can be used. Such methods, when protecting the confidentiality of tabular data (contingency tables), often utilise the Poisson or Poisson-mixture distributions, such as the negative binomial (NBI). These distributions, however, are either equidispersed (in the case of the Poisson) or overdispersed (e.g. in the case of the NBI), which results in excessive noise being applied to large low-risk counts. This paper proposes the use of the (discretized) gamma family (GAF) distribution, which allows noise to be applied in a more bespoke fashion. Specifically, it allows less noise to be applied as cell counts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCensus and Population Estimation · Probability and Risk Models · Data-Driven Disease Surveillance
