$k$-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers
Djordje Slijep\v{c}evi\'c, Maximilian Henzl, Lukas Daniel Klausner,, Tobias Dam, Peter Kieseberg, Matthias Zeppelzauer

TL;DR
This paper systematically examines how various $k$-anonymity algorithms, especially Mondrian, impact machine learning classifier performance across different datasets, highlighting trade-offs between privacy and accuracy.
Contribution
It provides a comprehensive comparison of $k$-anonymisation methods and their effects on classifiers, identifying Mondrian as particularly effective for preserving classification performance.
Findings
Increasing $k$-anonymity generally reduces classification accuracy.
The impact varies significantly depending on the dataset and anonymisation method.
Mondrian maintains better classification performance compared to other algorithms.
Abstract
The protection of private information is a crucial issue in data-driven research and business contexts. Typically, techniques like anonymisation or (selective) deletion are introduced in order to allow data sharing, e. g. in the case of collaborative research endeavours. For use with anonymisation techniques, the -anonymity criterion is one of the most popular, with numerous scientific publications on different algorithms and metrics. Anonymisation techniques often require changing the data and thus necessarily affect the results of machine learning models trained on the underlying data. In this work, we conduct a systematic comparison and detailed investigation into the effects of different -anonymisation algorithms on the results of machine learning models. We investigate a set of popular -anonymisation algorithms with different classifiers and evaluate them on different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Imbalanced Data Classification Techniques
