Interrogating LLM design under a fair learning doctrine
Johnny Tian-Zheng Wei, Maggie Wang, Ameya Godbole, Jonathan H. Choi,, Robin Jia

TL;DR
This paper introduces a 'fair learning' framework for LLMs, focusing on training decisions' impact on memorization, to better address copyright risks beyond output similarity.
Contribution
It proposes a new legal and technical standard for fair learning in LLMs, emphasizing training process analysis over output comparison.
Findings
Deconstructed Pythia LLM using causal and correlational analyses.
Connected memorization analysis to a legal standard for fair learning.
Suggested evolution of fair learning standards for clarity and rule-based application.
Abstract
The current discourse on large language models (LLMs) and copyright largely takes a "behavioral" perspective, focusing on model outputs and evaluating whether they are substantially similar to training data. However, substantial similarity is difficult to define algorithmically and a narrow focus on model outputs is insufficient to address all copyright risks. In this interdisciplinary work, we take a complementary "structural" perspective and shift our focus to how LLMs are trained. We operationalize a notion of "fair learning" by measuring whether any training decision substantially affected the model's memorization. As a case study, we deconstruct Pythia, an open-source LLM, and demonstrate the use of causal and correlational analyses to make factual determinations about Pythia's training decisions. By proposing a legal standard for fair learning and connecting memorization analyses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternational Arbitration and Investment Law · European and International Contract Law · Law, Economics, and Judicial Systems
MethodsPythia · Focus
