Distribution-Preserving k-Anonymity

Dennis Wei; Karthikeyan Natesan Ramamurthy; Kush R. Varshney

arXiv:1711.01514·stat.ML·November 7, 2017

Distribution-Preserving k-Anonymity

Dennis Wei, Karthikeyan Natesan Ramamurthy, Kush R. Varshney

PDF

Open Access

TL;DR

This paper introduces a novel distribution-preserving k-anonymity framework that enhances data privacy while maintaining utility for specific workloads, demonstrated through real-world healthcare data experiments.

Contribution

It presents a new approach combining distribution-preserving quantization and k-member clustering for improved privacy and utility in data anonymization.

Findings

01

Outperforms standard k-anonymization in utility for healthcare applications

02

Maintains k-anonymity while preserving data distribution

03

Effective in covariate shift and transfer learning scenarios

Abstract

Preserving the privacy of individuals by protecting their sensitive attributes is an important consideration during microdata release. However, it is equally important to preserve the quality or utility of the data for at least some targeted workloads. We propose a novel framework for privacy preservation based on the k-anonymity model that is ideally suited for workloads that require preserving the probability distribution of the quasi-identifier variables in the data. Our framework combines the principles of distribution-preserving quantization and k-member clustering, and we specialize it to two variants that respectively use intra-cluster and Gaussian dithering of cluster centers to achieve distribution preservation. We perform theoretical analysis of the proposed schemes in terms of distribution preservation, and describe their utility in workloads such as covariate shift and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Ethics in Clinical Research · Data-Driven Disease Surveillance