A Sustainable AI Economy Needs Data Deals That Work for Generators
Ruoxi Jia, Luis Oala, Wenjie Xiong, Suqin Ge, Jiachen T. Wang, Feiyang Kang, Dawn Song

TL;DR
This paper highlights the economic imbalance in AI data deals, showing most value goes to aggregators while data creators receive little, and proposes a framework for fairer data exchanges.
Contribution
It analyzes existing data deals, identifies structural faults causing inequality, and introduces the EDVEX framework for equitable data-value exchanges in AI.
Findings
Most value from data deals accrues to aggregators.
Creator royalties are negligible and deal terms are opaque.
The EDVEX framework aims to promote fairer data transactions.
Abstract
We argue that the machine learning value chain is structurally unsustainable due to an economic data processing inequality: each state in the data cycle from inputs to model weights to synthetic outputs refines technical signal but strips economic equity from data generators. We show, by analyzing seventy-three public data deals, that the majority of value accrues to aggregators, with documented creator royalties rounding to zero and widespread opacity of deal terms. This is not just an economic welfare concern: as data and its derivatives become economic assets, the feedback loop that sustains current learning algorithms is at risk. We identify three structural faults - missing provenance, asymmetric bargaining power, and non-dynamic pricing - as the operational machinery of this inequality. In our analysis, we trace these problems along the machine learning value chain and propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Scientific Computing and Data Management · Explainable Artificial Intelligence (XAI)
