Real-world K-Anonymity Applications: the \textsc{KGen} approach and its   evaluation in Fraudulent Transactions

Daniel De Pascale; Giuseppe Cascavilla; Damian A. Tamburri; Willem-Jan; Van Den Heuvel

arXiv:2204.01533·cs.CR·April 5, 2022

Real-world K-Anonymity Applications: the \textsc{KGen} approach and its evaluation in Fraudulent Transactions

Daniel De Pascale, Giuseppe Cascavilla, Damian A. Tamburri, Willem-Jan, Van Den Heuvel

PDF

Open Access

TL;DR

This paper introduces KGen, a genetic algorithm-based approach for K-anonymity that effectively handles large datasets, ensuring high privacy levels while maintaining data utility, demonstrated through real-world data evaluations.

Contribution

The paper presents KGen, a novel meta-heuristic method using genetic algorithms to achieve scalable K-anonymity for Big Data datasets, overcoming limitations of existing approaches.

Findings

01

KGen efficiently anonymizes datasets with up to 97 attributes.

02

It maintains high anonymity levels without significant data utility loss.

03

The approach performs well on real-world datasets from the Dutch Tax Authority.

Abstract

K-Anonymity is a property for the measurement, management, and governance of the data anonymization. Many implementations of k-anonymity have been described in state of the art, but most of them are not able to work with a large number of attributes in a "Big" dataset, i.e., a dataset drawn from Big Data. To address this significant shortcoming, we introduce and evaluate \textsc{KGen} an approach to K-anonymity featuring Genetic Algorithms. \textsc{KGen} promotes such a meta-heuristic approach since it can solve the problem by finding a pseudo-optimal solution in a reasonable time over a considerable load of input. \textsc{KGen} allows the data manager to guarantee a high anonymity level while preserving the usability and preventing loss of information entropy over the data. Differently from other approaches that provide optimal global solutions catered for small datasets, \textsc{KGen}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Privacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting