Buying Data of Unknown Quality: Fisher Information Procurement Auctions
Yuchen Hu, Martin J. Wainwright, Stephen Bates

TL;DR
This paper explores mechanisms for purchasing data of uncertain quality in data markets, proposing auction designs that incentivize truthful reporting and optimize data procurement for statistical estimation.
Contribution
It introduces a second-score procurement mechanism for known data quality and a verification-based mechanism for private quality, ensuring truthful reporting and efficient data acquisition.
Findings
The second-score mechanism effectively ranks providers by cost per information unit.
The verification mechanism incentivizes truthful quality reporting with vanishing deviations.
The analysis shows how verification and tradeoffs influence participation and misreporting.
Abstract
We study statistical parameter estimation in the setting of data markets. A buyer seeks to estimate a parameter based on samples that can be purchased from competing providers that differ in their data quality and provision costs. When quality is known ex ante, we define a cost-per-information score that summarizes each provider's provision cost per unit of information about the buyer's estimation objective. We describe second-score procurement mechanism that ranks providers by this score, and endogenously chooses both a provider and a sample size while making truthful cost reports optimal. We then turn to the more realistic setting where data quality is private, and can only be indirectly observed via the delivered data. In this setting, we propose a simple mechanism that augments the second-score rule with a lenient ex post statistical test of the reported quality. We prove that under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
