K-Nearest Neighbor Classification Using Anatomized Data
Koray Mancuhan, Chris Clifton

TL;DR
This paper investigates how anatomized data affects k-nearest neighbor classification, providing theoretical bounds and empirical validation, showing that anatomized data can approach unprotected data performance with larger training sets.
Contribution
It offers a theoretical analysis of k-NN error bounds with anatomized data and demonstrates improved performance over generalization-based anonymization methods.
Findings
Anatomized data approaches unprotected data performance with larger training sets.
Nearest neighbor with anatomized data outperforms generalization-based anonymization.
Theoretical bounds are validated empirically.
Abstract
This paper analyzes k nearest neighbor classification with training data anonymized using anatomy. Anatomy preserves all data values, but introduces uncertainty in the mapping between identifying and sensitive values. We first study the theoretical effect of the anatomized training data on the k nearest neighbor error rate bounds, nearest neighbor convergence rate, and Bayesian error. We then validate the derived bounds empirically. We show that 1) Learning from anatomized data approaches the limits of learning through the unprotected data (although requiring larger training data), and 2) nearest neighbor using anatomized data outperforms nearest neighbor on generalization-based anonymization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
