Saibot: A Differentially Private Data Search Platform
Zezhou Huang, Jiaxiang Liu, Daniel Alabi, Raul Castro Fernandez,, Eugene Wu

TL;DR
Saibot introduces a novel differentially private data search platform that efficiently finds dataset augmentations to improve ML model performance while preserving privacy, outperforming existing DP mechanisms significantly.
Contribution
Saibot employs the Factorized Privacy Mechanism (FPM), a new DP technique, enabling scalable and accurate private data search for ML dataset augmentation.
Findings
Saibot achieves 50-90% of non-private search accuracy.
FPM reduces privacy budget depletion compared to traditional methods.
Saibot outperforms TPM, APM, and shuffling in experiments.
Abstract
Recent data search platforms use ML task-based utility measures rather than metadata-based keywords, to search large dataset corpora. Requesters submit a training dataset and these platforms search for augmentations (join or union compatible datasets) that, when used to augment the requester's dataset, most improve model (e.g., linear regression) performance. Although effective, providers that manage personally identifiable data demand differential privacy (DP) guarantees before granting these platforms data access. Unfortunately, making data search differentially private is nontrivial, as a single search can involve training and evaluating datasets hundreds or thousands of times, quickly depleting privacy budgets. We present Saibot, a differentially private data search platform that employs Factorized Privacy Mechanism (FPM), a novel DP mechanism, to calculate sufficient semi-ring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Age of Information Optimization
