$k$-Anonymity in Practice: How Generalisation and Suppression Affect   Machine Learning Classifiers

Djordje Slijep\v{c}evi\'c; Maximilian Henzl; Lukas Daniel Klausner,; Tobias Dam; Peter Kieseberg; Matthias Zeppelzauer

arXiv:2102.04763·cs.LG·June 23, 2022·5 cites

$k$-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers

Djordje Slijep\v{c}evi\'c, Maximilian Henzl, Lukas Daniel Klausner,, Tobias Dam, Peter Kieseberg, Matthias Zeppelzauer

PDF

Open Access 1 Repo

TL;DR

This paper systematically examines how various $k$-anonymity algorithms, especially Mondrian, impact machine learning classifier performance across different datasets, highlighting trade-offs between privacy and accuracy.

Contribution

It provides a comprehensive comparison of $k$-anonymisation methods and their effects on classifiers, identifying Mondrian as particularly effective for preserving classification performance.

Findings

01

Increasing $k$-anonymity generally reduces classification accuracy.

02

The impact varies significantly depending on the dataset and anonymisation method.

03

Mondrian maintains better classification performance compared to other algorithms.

Abstract

The protection of private information is a crucial issue in data-driven research and business contexts. Typically, techniques like anonymisation or (selective) deletion are introduced in order to allow data sharing, e. g. in the case of collaborative research endeavours. For use with anonymisation techniques, the $k$ -anonymity criterion is one of the most popular, with numerous scientific publications on different algorithms and metrics. Anonymisation techniques often require changing the data and thus necessarily affect the results of machine learning models trained on the underlying data. In this work, we conduct a systematic comparison and detailed investigation into the effects of different $k$ -anonymisation algorithms on the results of machine learning models. We investigate a set of popular $k$ -anonymisation algorithms with different classifiers and evaluate them on different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fhstp/k-AnonML
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Imbalanced Data Classification Techniques