Multi-Objective Optimization-Based Anonymization of Structured Data for Machine Learning Application
Yusi Wei, Hande Y. Benson, Joseph K. Agor, Muge Capan

TL;DR
This paper introduces a multi-objective optimization model for anonymizing structured data that effectively balances privacy protection and data utility, validated across diverse datasets with improved results over existing methods.
Contribution
It proposes a novel multi-objective optimization approach that better handles categorical variables and evaluates privacy and utility across multiple datasets, advancing data anonymization techniques.
Findings
Lower information loss compared to existing algorithms
Reduced susceptibility to linkage and homogeneity attacks
Maintains ML performance comparable to original data
Abstract
Organizations are collecting vast amounts of data, but they often lack the capabilities needed to fully extract insights. As a result, they increasingly share data with external experts, such as analysts or researchers, to gain value from it. However, this practice introduces significant privacy risks. Various techniques have been proposed to address privacy concerns in data sharing. However, these methods often degrade data utility, impacting the performance of machine learning (ML) models. Our research identifies key limitations in existing optimization models for privacy preservation, particularly in handling categorical variables, and evaluating effectiveness across diverse datasets. We propose a novel multi-objective optimization model that simultaneously minimizes information loss and maximizes protection against attacks. This model is empirically validated using diverse datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Digital and Cyber Forensics
