Privacy Protection for Youth Risk Behavior Using Bayesian Data Synthesis: A Case Study to the YRBS
Yixiao Cao, Jingchen Hu

TL;DR
This paper demonstrates how Bayesian data synthesis, specifically using DPMPM, can protect youth risk behavior survey data privacy while maintaining data utility, facilitating sensitive data dissemination.
Contribution
It introduces a Bayesian synthetic data approach for youth risk behavior data, balancing privacy protection with data utility, demonstrated through a YRBS case study.
Findings
Synthetic data significantly reduces disclosure risks.
High utility of synthetic data maintained.
Effective privacy protection demonstrated.
Abstract
The large number of publicly available survey datasets of wide variety, albeit useful, raise respondent-level privacy concerns. The synthetic data approach to data privacy and confidentiality has been shown useful in terms of privacy protection and utility preservation. This paper aims at illustrating how synthetic data can facilitate the dissemination of highly sensitive information about youth risk behavior by presenting a case study of synthetic data for a sample of the Youth Risk Behavior Survey (YRBS). Given the categorical nature of almost all variables in YRBS, the Dirichlet Process mixture of products of multinomials (DPMPM) synthesizer is adopted to partially synthesize the YRBS sample. Detailed evaluations of utility and disclosure risks demonstrate that the generated synthetic data are able to significantly reduce the disclosure risks compared to the confidential YRSB sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
