Diversifying Anonymized Data with Diversity Constraints
Mostafa Milani, Yu Huang, Fei Chiang

TL;DR
This paper introduces a method for generating anonymized data that satisfies diversity constraints, ensuring less biased and more representative datasets while maintaining privacy, with proven efficiency and experimental validation.
Contribution
It formalizes diversity constraints in data anonymization, proves their computational tractability, and proposes a clustering-based algorithm with extensive experimental validation.
Findings
Determining the existence of a diverse anonymized instance is in PTIME.
The proposed clustering algorithm effectively enforces diversity constraints.
Experiments show improved diversity and utility over existing methods.
Abstract
Recently introduced privacy legislation has aimed to restrict and control the amount of personal data published by companies and shared to third parties. Much of this real data is not only sensitive requiring anonymization, but also contains characteristic details from a variety of individuals. This diversity is desirable in many applications ranging from Web search to drug and product development. Unfortunately, data anonymization techniques have largely ignored diversity in its published result. This inadvertently propagates underlying bias in subsequent data analysis. We study the problem of finding a diverse anonymized data instance where diversity is measured via a set of diversity constraints. We formalize diversity constraints and study their foundations such as implication and satisfiability. We show that determining the existence of a diverse, anonymized instance can be done in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Privacy, Security, and Data Protection
