Privacy-Preserving Public Release of Datasets for Support Vector Machine   Classification

Farhad Farokhi

arXiv:1912.12576·cs.CR·January 1, 2020

Privacy-Preserving Public Release of Datasets for Support Vector Machine Classification

Farhad Farokhi

PDF

Open Access

TL;DR

This paper proposes a method for releasing datasets for SVM classification that balances privacy and utility by adding optimally designed noise, ensuring privacy protection while maintaining classifier accuracy.

Contribution

It introduces an optimal noise addition technique based on Fisher information and differential privacy, applicable to SVMs and other optimization-based machine learning algorithms.

Findings

01

Optimal noise distribution maximizes privacy-utility trade-off.

02

Method achieves local differential privacy guarantees.

03

Demonstrated effectiveness on multiple datasets.

Abstract

We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects (i.e., individuals whose private information is stored in the dataset). The dataset is systematically obfuscated using an additive noise for privacy protection. Motivated by the Cramer-Rao bound, inverse of the trace of the Fisher information matrix is used as a measure of the privacy. Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other (capturing the utility). The optimal noise distribution is determined by maximizing a weighted sum of the measures of privacy and utility. The optimal privacy-preserving noise is proved to achieve local differential privacy. The results are generalized to a broader class of optimization-based supervised machine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Wireless Communication Security Techniques · Stochastic Gradient Optimization Techniques