A One-Pass Private Sketch for Most Machine Learning Tasks
Benjamin Coleman, Anshumali Shrivastava

TL;DR
This paper introduces a one-pass, differentially private sketching method that efficiently supports various machine learning tasks on large, high-dimensional datasets while maintaining competitive privacy-utility tradeoffs.
Contribution
The paper presents a novel one-pass private sketch using randomized contingency tables and locality-sensitive hashing, enabling efficient, multi-task data analysis under differential privacy.
Findings
Supports multiple ML tasks including regression and classification
Achieves competitive error bounds for DP kernel density estimation
Operates efficiently on large, high-dimensional datasets in a single pass
Abstract
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees. Inspired by recent progress toward general-purpose data release algorithms, we propose a private sketch, or small summary of the dataset, that supports a multitude of machine learning tasks including regression, classification, density estimation, near-neighbor search, and more. Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm. We prove competitive error bounds for DP kernel density estimation. Existing methods for DP kernel density estimation scale poorly, often exponentially slower with an increase in dimensions. In contrast, our sketch can quickly run on large, high-dimensional datasets in a single pass. Exhaustive experiments show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques
