Fundamentals of Task-Agnostic Data Valuation
Mohammad Mohammadi Amiri, Frederic Berdoz, Ramesh Raskar

TL;DR
This paper introduces a task-agnostic method for valuing data based on statistical differences in diversity and relevance, without relying on specific task metrics, and demonstrates its effectiveness through experiments.
Contribution
It proposes a novel query-based approach to estimate data diversity and relevance without exposing raw data, enabling task-agnostic data valuation.
Findings
Effective estimation of data diversity and relevance demonstrated on real datasets.
Queries designed to prevent seller data fabrication and protect privacy.
Method captures statistical differences without task-specific utility metrics.
Abstract
We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer. We capture these statistical differences through second moment by measuring diversity and relevance of the seller's data for the buyer; we estimate these measures through queries to the seller without requesting raw data. We design the queries with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Explainable Artificial Intelligence (XAI)
MethodsTest
