Generative AI Training and Copyright Law
Sebastian Stober, Tim W. Dornis

TL;DR
This paper examines the legal challenges of copyright infringement in generative AI training, arguing that current fair use and TDM exceptions do not adequately cover the unique nature of AI training data use.
Contribution
The paper clarifies that generative AI training fundamentally differs from traditional TDM and fair use, highlighting implications for legal practices and research.
Findings
Generative AI training is not covered by TDM or fair use exceptions.
Training data memorization can cause copyright issues independently.
ISMIR can play a role in promoting fair AI training practices.
Abstract
Training generative AI models requires extensive amounts of data. A common practice is to collect such data through web scraping. Yet, much of what has been and is collected is copyright protected. Its use may be copyright infringement. In the USA, AI developers rely on "fair use" and in Europe, the prevailing view is that the exception for "Text and Data Mining" (TDM) applies. In a recent interdisciplinary tandem-study, we have argued in detail that this is actually not the case because generative AI training fundamentally differs from TDM. In this article, we share our main findings and the implications for both public and corporate research on generative models. We further discuss how the phenomenon of training data memorization leads to copyright issues independently from the "fair use" and TDM exceptions. Finally, we outline how the ISMIR could contribute to the ongoing discussion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property
