Real-world K-Anonymity Applications: the \textsc{KGen} approach and its evaluation in Fraudulent Transactions
Daniel De Pascale, Giuseppe Cascavilla, Damian A. Tamburri, Willem-Jan, Van Den Heuvel

TL;DR
This paper introduces KGen, a genetic algorithm-based approach for K-anonymity that effectively handles large datasets, ensuring high privacy levels while maintaining data utility, demonstrated through real-world data evaluations.
Contribution
The paper presents KGen, a novel meta-heuristic method using genetic algorithms to achieve scalable K-anonymity for Big Data datasets, overcoming limitations of existing approaches.
Findings
KGen efficiently anonymizes datasets with up to 97 attributes.
It maintains high anonymity levels without significant data utility loss.
The approach performs well on real-world datasets from the Dutch Tax Authority.
Abstract
K-Anonymity is a property for the measurement, management, and governance of the data anonymization. Many implementations of k-anonymity have been described in state of the art, but most of them are not able to work with a large number of attributes in a "Big" dataset, i.e., a dataset drawn from Big Data. To address this significant shortcoming, we introduce and evaluate \textsc{KGen} an approach to K-anonymity featuring Genetic Algorithms. \textsc{KGen} promotes such a meta-heuristic approach since it can solve the problem by finding a pseudo-optimal solution in a reasonable time over a considerable load of input. \textsc{KGen} allows the data manager to guarantee a high anonymity level while preserving the usability and preventing loss of information entropy over the data. Differently from other approaches that provide optimal global solutions catered for small datasets, \textsc{KGen}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Privacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting
