Bayesian nonparametric disclosure risk estimation via mixed effects log-linear models
Cinzia Carota, Maurizio Filippone, Roberto Leombruni, Silvia Polettini

TL;DR
This paper introduces a Bayesian nonparametric approach using Dirichlet process random effects to improve disclosure risk estimation in sparse contingency tables, reducing model complexity while maintaining accuracy.
Contribution
It proposes a novel mixed effects log-linear model with Dirichlet process random effects, enhancing risk estimation reliability and computational efficiency.
Findings
Random effects reduce the need for many fixed effects.
Models with only main effects perform comparably to complex interaction models.
Bayesian approach provides credible intervals and accounts for uncertainty.
Abstract
Statistical agencies and other institutions collect data under the promise to protect the confidentiality of respondents. When releasing microdata samples, the risk that records can be identified must be assessed. To this aim, a widely adopted approach is to isolate categorical variables key to the identification and analyze multi-way contingency tables of such variables. Common disclosure risk measures focus on sample unique cells in these tables and adopt parametric log-linear models as the standard statistical tools for the problem. Such models often have to deal with large and extremely sparse tables that pose a number of challenges to risk estimation. This paper proposes to overcome these problems by studying nonparametric alternatives based on Dirichlet process random effects. The main finding is that the inclusion of such random effects allows us to reduce considerably the number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
