Incentivizing User Data Contributions for LLM Improvement under Withdrawal Rights
Di Feng, Chenhao Zhang, Zhanzhan Zhao

TL;DR
This paper designs incentive mechanisms for user data contribution to improve large language models, addressing privacy, effort, and threshold effects through subsidies and withdrawal rights.
Contribution
It introduces a theoretical framework combining subsidies and withdrawal rights to ensure efficient data contribution and model improvement.
Findings
Decentralized responses may fall below the improvement threshold, causing subsidy waste.
Combining cost reporting with personalized assignment prevents subsidy leakage.
Sequential withdrawal protocols can incentivize more data provision and improve success probability.
Abstract
The continued improvement of large language models (LLMs) increasingly depends on eliciting high-quality, user-generated data, yet such data are costly to provide and often withheld due to privacy and effort concerns. This creates a fundamental design challenge: how to incentivize data contribution when model improvements require coordinated, threshold-level inputs, while contributions remain privately costly and partially reversible. We develop and theoretically analyze incentive mechanisms for user data contribution that explicitly account for threshold effects and reversibility, focusing on how subsidies and withdrawal rights can be jointly designed to overcome coordination failure. As a natural benchmark, we first consider subsidy-based incentives, under which users respond to posted payments with privately optimal floor contributions. These decentralized responses may fall below…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
