Comparison of machine learning models applied on anonymized data with   different techniques

Judith S\'ainz-Pardo D\'iaz; \'Alvaro L\'opez Garc\'ia

arXiv:2305.07415·cs.LG·May 15, 2023·1 cites

Comparison of machine learning models applied on anonymized data with different techniques

Judith S\'ainz-Pardo D\'iaz, \'Alvaro L\'opez Garc\'ia

PDF

Open Access

TL;DR

This paper evaluates how different anonymization techniques impact the performance of four classical machine learning classifiers on the adult dataset, highlighting the trade-offs between privacy and utility.

Contribution

It provides a comparative analysis of machine learning model performance under various anonymization methods and privacy parameters.

Findings

01

Performance decreases as privacy levels increase

02

Different anonymization techniques affect models differently

03

Trade-offs between privacy and classification accuracy

Abstract

Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or $ℓ$ -diversity. However, the application of these methods is directly connected to a reduction of their utility in prediction and decision making tasks. In this work we study four classical machine learning methods currently used for classification purposes in order to analyze the results as a function of the anonymization techniques applied and the parameters selected for each of them. The performance of these models is studied when varying the value of k for k-anonymity and additional tools such as $ℓ$ -diversity, t-closeness and $δ$ -disclosure privacy are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Ethics and Social Impacts of AI