Saibot: A Differentially Private Data Search Platform

Zezhou Huang; Jiaxiang Liu; Daniel Alabi; Raul Castro Fernandez,; Eugene Wu

arXiv:2307.00432·cs.DB·July 4, 2023

Saibot: A Differentially Private Data Search Platform

Zezhou Huang, Jiaxiang Liu, Daniel Alabi, Raul Castro Fernandez,, Eugene Wu

PDF

Open Access

TL;DR

Saibot introduces a novel differentially private data search platform that efficiently finds dataset augmentations to improve ML model performance while preserving privacy, outperforming existing DP mechanisms significantly.

Contribution

Saibot employs the Factorized Privacy Mechanism (FPM), a new DP technique, enabling scalable and accurate private data search for ML dataset augmentation.

Findings

01

Saibot achieves 50-90% of non-private search accuracy.

02

FPM reduces privacy budget depletion compared to traditional methods.

03

Saibot outperforms TPM, APM, and shuffling in experiments.

Abstract

Recent data search platforms use ML task-based utility measures rather than metadata-based keywords, to search large dataset corpora. Requesters submit a training dataset and these platforms search for augmentations (join or union compatible datasets) that, when used to augment the requester's dataset, most improve model (e.g., linear regression) performance. Although effective, providers that manage personally identifiable data demand differential privacy (DP) guarantees before granting these platforms data access. Unfortunately, making data search differentially private is nontrivial, as a single search can involve training and evaluating datasets hundreds or thousands of times, quickly depleting privacy budgets. We present Saibot, a differentially private data search platform that employs Factorized Privacy Mechanism (FPM), a novel DP mechanism, to calculate sufficient semi-ring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Age of Information Optimization