Differentially Private Algorithms for Empirical Machine Learning

Ben Stoddard; Yan Chen; Ashwin Machanavajjhala

arXiv:1411.5428·cs.LG·November 24, 2014·21 cites

Differentially Private Algorithms for Empirical Machine Learning

Ben Stoddard, Yan Chen, Ashwin Machanavajjhala

PDF

Open Access

TL;DR

This paper introduces practical differentially private algorithms for training classifiers, feature selection, and ROC curve construction, improving accuracy and enabling private evaluation on real-world datasets.

Contribution

It presents novel private algorithms for feature selection and ROC curve construction that enhance the practicality of differentially private machine learning workflows.

Findings

01

Significant accuracy improvements on three real-world datasets.

02

First private algorithms for ROC curve construction.

03

Effective feature selection under differential privacy.

Abstract

An important use of private data is to build machine learning classifiers. While there is a burgeoning literature on differentially private classification algorithms, we find that they are not practical in real applications due to two reasons. First, existing differentially private classifiers provide poor accuracy on real world datasets. Second, there is no known differentially private algorithm for empirically evaluating the private classifier on a private test dataset. In this paper, we develop differentially private algorithms that mirror real world empirical machine learning workflows. We consider the private classifier training algorithm as a blackbox. We present private algorithms for selecting features that are input to the classifier. Though adding a preprocessing step takes away some of the privacy budget from the actual classification process (thus potentially making it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques