Multivariate Microaggregation of Set-Valued Data

Malik Imran-Daud; Muhammad Shaheen; Abbas Ahmed

arXiv:2204.01305·cs.CR·April 5, 2022

Multivariate Microaggregation of Set-Valued Data

Malik Imran-Daud, Muhammad Shaheen, Abbas Ahmed

PDF

TL;DR

This paper introduces an adaptive microaggregation method for set-valued data that improves privacy preservation by forming semantically homogeneous clusters with variable sizes, reducing information loss compared to existing methods.

Contribution

It extends the MDAV microaggregation algorithm with semantic analysis and adaptive clustering based on taxonomic databases, enhancing data anonymization effectiveness.

Findings

01

Proposed method outperforms state-of-the-art solutions in experiments.

02

Clusters are more homogeneous and cohesive.

03

Information loss is minimized with the new approach.

Abstract

Data controllers manage immense data, and occasionally, it is released publically to help the researchers to conduct their studies. However, this publically shared data may hold personally identifiable information (PII) that can be collected to re-identify a person. Therefore, an effective anonymization mechanism is required to anonymize such data before it is released publically. Microaggregation is one of the Statistical Disclosure Control (SDC) methods that are widely used by many researchers. This method adapts the k-anonymity principle to generate k-indistinguishable records in the same clusters to preserve the privacy of the individuals. However, in these methods, the size of the clusters is fixed (i.e., k records), and the clusters generated through these methods may hold non-homogeneous records. By considering these issues, we propose an adaptive size clustering technique that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.