Statistical modelling under differential privacy constraints: A case study in fine-scale geographical analysis with Australian Bureau of Statistics TableBuilder data
Ewan Cameron

TL;DR
This paper presents a Bayesian likelihood-based method for reconstructing fine-scale geographical data from the Australian Census, which is modified for privacy using noise injection and suppression, enabling more accurate spatial analysis.
Contribution
It introduces a Bayesian approach that explicitly models the privacy-preserving perturbation algorithm, improving data reconstruction from differentially private census outputs.
Findings
Bayesian method effectively reconstructs original data from noisy, suppressed counts.
Demonstrates utility in real and simulated datasets for spatial analysis.
Enhances accuracy of small-area geographical data analysis.
Abstract
Guided by the principles of differential privacy protection the Australian Bureau of Statistics modifies the data summaries from the Australian Census provided through TableBuilder to researchers at approved institutions. This modification algorithm includes the injection of a small degree of artificial noise to every nonzero cell count followed by the suppression of very small cell counts to zero. Researchers working with small area TableBuilder outputs with a high suppression fraction have proposed various algorithmic solutions to reconciling these with less suppressed outputs from larger enclosing areas. Here we propose that a Bayesian, likelihood-based statistical approach in which the perturbation algorithm itself is explicitly represented is well suited to analyses with such randomly perturbed data. Using both real (TableBuilder) and mock datasets representing dwelling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData-Driven Disease Surveillance · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference
