Statistical Learning Theory Approach for Data Classification with l-diversity
Koray Mancuhan, Chris Clifton

TL;DR
This paper provides a theoretical foundation showing support vector classifiers trained on anatomized data satisfying l-diversity perform comparably to those trained on original data, supporting privacy-preserving data mining.
Contribution
It introduces a theoretical justification for using anatomized data with l-diversity in support vector classification, demonstrating comparable performance to original data.
Findings
Support vector classifiers on anatomized data match original data performance.
Outperforms k-anonymity protected data in classification accuracy.
Validated on multiple public datasets.
Abstract
Corporations are retaining ever-larger corpuses of personal data; the frequency or breaches and corresponding privacy impact have been rising accordingly. One way to mitigate this risk is through use of anonymized data, limiting the exposure of individual data to only where it is absolutely needed. This would seem particularly appropriate for data mining, where the goal is generalizable knowledge rather than data on specific individuals. In practice, corporate data miners often insist on original data, for fear that they might "miss something" with anonymized or differentially private approaches. This paper provides a theoretical justification for the use of anonymized data. Specifically, we show that a support vector classifier trained on anatomized data satisfying l-diversity should be expected to do as well as on the original data. Anatomy preserves all data values, but introduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
