Learning from Anonymized and Incomplete Tabular Data
Lucas Lange, Adrian B\"ottinger, Victor Christen, Anushka Vidanage, Peter Christen, Erhard Rahm

TL;DR
This paper introduces new data transformation methods for learning from anonymized and incomplete tabular data, demonstrating improved utility over standard approaches across various privacy scenarios.
Contribution
It proposes novel data transformation strategies tailored for heterogeneous anonymized data, addressing challenges in machine learning with privacy-preserving tabular datasets.
Findings
Generalized values outperform suppression in utility retention
Data preparation strategy effectiveness varies by scenario
Consistent data representations are key for downstream utility
Abstract
User-driven privacy allows individuals to control whether and at what granularity their data is shared, leading to datasets that mix original, generalized, and missing values within the same records and attributes. While such representations are intuitive for privacy, they pose challenges for machine learning, which typically treats non-original values as new categories or as missing, thereby discarding generalization semantics. For learning from such tabular data, we propose novel data transformation strategies that account for heterogeneous anonymization and evaluate them alongside standard imputation and LLM-based approaches. We employ multiple datasets, privacy configurations, and deployment scenarios, demonstrating that our method reliably regains utility. Our results show that generalized values are preferable to pure suppression, that the best data preparation strategy depends on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Ethics and Social Impacts of AI
