Data Appraisal Without Data Sharing
Mimee Xu, Laurens van der Maaten, Awni Hannun

TL;DR
This paper introduces a privacy-preserving data appraisal method using multi-party computation and influence functions to facilitate data exchange without sharing raw data, aiming to improve data markets.
Contribution
It proposes a novel data appraisal approach that estimates data value privately via influence functions, enabling efficient data transactions without data sharing.
Findings
Effective data valuation despite label noise and class imbalance
No additional hyper-parameters or re-training needed
Balances appraisal quality with computational efficiency
Abstract
One of the most effective approaches to improving the performance of a machine learning model is to procure additional training data. A model owner seeking relevant training data from a data owner needs to appraise the data before acquiring it. However, without a formal agreement, the data owner does not want to share data. The resulting Catch-22 prevents efficient data markets from forming. This paper proposes adding a data appraisal stage that requires no data sharing between data owners and model owners. Specifically, we use multi-party computation to implement an appraisal function computed on private data. The appraised value serves as a guide to facilitate data selection and transaction. We propose an efficient data appraisal method based on forward influence functions that approximates data value through its first-order loss reduction on the current model. The method requires no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
