The Fast and the Private: Task-based Dataset Search
Zezhou Huang, Jiaxiang Liu, Haonan Wang, and Eugene Wu

TL;DR
Mileena is a novel dataset search platform that enhances speed, privacy, and quality in task-based ML dataset retrieval by leveraging semi-ring sketches and differential privacy mechanisms.
Contribution
The paper introduces Mileena, a fast and private task-based dataset search system utilizing semi-ring sketches and a new privacy mechanism, improving efficiency and privacy without sacrificing data utility.
Findings
Mileena achieves high-speed dataset search with low latency.
The system maintains differential privacy across large datasets.
Preliminary results show improved data utility and privacy balance.
Abstract
Modern dataset search platforms employ ML task-based utility metrics instead of relying on metadata-based keywords to comb through extensive dataset repositories. In this setup, requesters provide an initial dataset, and the platform identifies complementary datasets to augment (join or union) the requester's dataset such that the ML model (e.g., linear regression) performance is improved most. Although effective, current task-based data searches are stymied by (1) high latency which deters users, (2) privacy concerns for regulatory standards, and (3) low data quality which provides low utility. We introduce Mileena, a fast, private, and high-quality task-based dataset search platform. At its heart, Mileena is built on pre-computed semi-ring sketches for efficient ML training and evaluation. Based on semi-ring, we develop a novel Factorized Privacy Mechanism that makes the search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
