Distribution-Preserving k-Anonymity
Dennis Wei, Karthikeyan Natesan Ramamurthy, Kush R. Varshney

TL;DR
This paper introduces a novel distribution-preserving k-anonymity framework that enhances data privacy while maintaining utility for specific workloads, demonstrated through real-world healthcare data experiments.
Contribution
It presents a new approach combining distribution-preserving quantization and k-member clustering for improved privacy and utility in data anonymization.
Findings
Outperforms standard k-anonymization in utility for healthcare applications
Maintains k-anonymity while preserving data distribution
Effective in covariate shift and transfer learning scenarios
Abstract
Preserving the privacy of individuals by protecting their sensitive attributes is an important consideration during microdata release. However, it is equally important to preserve the quality or utility of the data for at least some targeted workloads. We propose a novel framework for privacy preservation based on the k-anonymity model that is ideally suited for workloads that require preserving the probability distribution of the quasi-identifier variables in the data. Our framework combines the principles of distribution-preserving quantization and k-member clustering, and we specialize it to two variants that respectively use intra-cluster and Gaussian dithering of cluster centers to achieve distribution preservation. We perform theoretical analysis of the proposed schemes in terms of distribution preservation, and describe their utility in workloads such as covariate shift and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics in Clinical Research · Data-Driven Disease Surveillance
