Measuring pattern retention in anonymized data -- where one measure is not enough
Sam Fletcher, Md Zahidul Islam

TL;DR
This paper introduces new measures to evaluate how well anonymized data retains original patterns, highlighting that prediction accuracy alone is insufficient for comprehensive assessment.
Contribution
It proposes a novel methodology and three measures to better evaluate pattern retention in anonymized data, complementing existing accuracy-based metrics.
Findings
New measures effectively capture pattern retention
Prediction accuracy alone is inadequate for data similarity assessment
Methodology enhances evaluation of anonymized data quality
Abstract
In this paper, we explore how modifying data to preserve privacy affects the quality of the patterns discoverable in the data. For any analysis of modified data to be worth doing, the data must be as close to the original as possible. Therein lies a problem -- how does one make sure that modified data still contains the information it had before modification? This question is not the same as asking if an accurate classifier can be built from the modified data. Often in the literature, the prediction accuracy of a classifier made from modified (anonymized) data is used as evidence that the data is similar to the original. We demonstrate that this is not the case, and we propose a new methodology for measuring the retention of the patterns that existed in the original data. We then use our methodology to design three measures that can be easily implemented, each measuring aspects of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Mining Algorithms and Applications · Imbalanced Data Classification Techniques
