Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification
Farhad Farokhi

TL;DR
This paper proposes a method for releasing datasets for SVM classification that balances privacy and utility by adding optimally designed noise, ensuring privacy protection while maintaining classifier accuracy.
Contribution
It introduces an optimal noise addition technique based on Fisher information and differential privacy, applicable to SVMs and other optimization-based machine learning algorithms.
Findings
Optimal noise distribution maximizes privacy-utility trade-off.
Method achieves local differential privacy guarantees.
Demonstrated effectiveness on multiple datasets.
Abstract
We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix is used as a measure of the privacy. Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other (capturing the utility). The optimal noise distribution is determined by maximizing a weighted sum of the measures of privacy and utility. The optimal privacy-preserving noise is proved to achieve local differential privacy. The results are generalized to a broader class of optimization-based supervised machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Wireless Communication Security Techniques · Stochastic Gradient Optimization Techniques
