Data Privacy Protection and Utility Preservation through Bayesian Data Synthesis: A Case Study on Airbnb Listings
Shijie Guo, Jingchen Hu

TL;DR
This paper presents a Bayesian data synthesis approach for protecting privacy in Airbnb data, using specialized models for sensitive variables and evaluating utility and disclosure risks under various intruder knowledge scenarios.
Contribution
It introduces a zero-inflated truncated Poisson model for synthesizing the number of available days and a sequential approach for price, enhancing privacy while maintaining data utility.
Findings
Synthetic data effectively balances privacy and utility.
Uncertainty in intruder knowledge significantly impacts disclosure risk assessments.
The proposed models handle zero-inflation and truncation in sensitive variables.
Abstract
When releasing record-level data containing sensitive information to the public, the data disseminator is responsible for protecting the privacy of every record in the dataset, simultaneously preserving important features of the data for users' analyses. These goals can be achieved by data synthesis, where confidential data are replaced with synthetic data that are simulated based on statistical models estimated on the confidential data. In this paper, we present a data synthesis case study, where synthetic values of price and the number of available days in a sample of the New York Airbnb Open Data are created for privacy protection. One sensitive variable, the number of available days of an Airbnb listing, has a large amount of zero-valued records and also truncated at the two ends. We propose a zero-inflated truncated Poisson regression model for its synthesis. We utilize a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data-Driven Disease Surveillance · Data Quality and Management
