Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using an Adaptive Sampling Algorithm
Boxin Zhao, Boxiang Lyu, Raul Castro Fernandez, Mladen Kolar

TL;DR
This paper presents an adaptive sampling algorithm that efficiently addresses both budget and revenue allocation in data markets, enabling practical and scalable data trading for machine learning models.
Contribution
A novel linear-time algorithm for simultaneous budget and revenue allocation in data markets, applicable in centralized and federated settings, with theoretical guarantees and empirical validation.
Findings
Algorithm operates in linear time, improving efficiency.
Revenue allocation properties resemble Shapley's value.
Empirical results demonstrate practical effectiveness.
Abstract
High-quality machine learning models are dependent on access to high-quality training data. When the data are not already available, it is tedious and costly to obtain them. Data markets help with identifying valuable training data: model consumers pay to train a model, the market uses that budget to identify data and train the model (the budget allocation problem), and finally the market compensates data providers according to their data contribution (revenue allocation problem). For example, a bank could pay the data market to access data from other financial institutions to train a fraud detection model. Compensating data contributors requires understanding data's contribution to the model; recent efforts to solve this revenue allocation problem based on the Shapley value are inefficient to lead to practical data markets. In this paper, we introduce a new algorithm to solve budget…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Privacy-Preserving Technologies in Data · Imbalanced Data Classification Techniques
