AI and the Dynamic Supply of Training Data

Christian Peukert; Florian Abeillon; J\'er\'emie Haese; Franziska Kaiser; Alexander Staub

arXiv:2404.18445·econ.GN·June 5, 2025

AI and the Dynamic Supply of Training Data

Christian Peukert, Florian Abeillon, J\'er\'emie Haese, Franziska Kaiser, Alexander Staub

PDF

Open Access

TL;DR

This paper investigates how contributors to Unsplash react when their works are used as training data for AI, revealing behavioral changes that impact data diversity and quality, and discusses policy solutions to address these issues.

Contribution

It provides empirical evidence on contributor reactions to AI training data use, highlighting behavioral shifts and proposing incentive-aligned policy interventions.

Findings

01

Higher dropout rates among affected contributors

02

Reduced upload rates for professional and heavily affected users

03

Changes in contribution diversity and novelty

Abstract

Artificial intelligence (AI) systems rely heavily on human-generated data, yet the people behind that data are often overlooked. Human behavior can play a major role in AI training datasets, be it in limiting access to existing works or in deciding which types of new works to create or whether to create any at all. We examine creators' behavioral change when their works become training data for commercial AI. Specifically, we focus on contributors on Unsplash, a popular stock image platform with about 6 million high-quality photos and illustrations. In the summer of 2020, Unsplash launched a research program and released a dataset of 25,000 images for commercial AI use. We study contributors' reactions, comparing contributors whose works were included in this dataset to contributors whose works were not. Our results suggest that treated contributors left the platform at a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence

MethodsFocus