Interrogating LLM design under a fair learning doctrine

Johnny Tian-Zheng Wei; Maggie Wang; Ameya Godbole; Jonathan H. Choi,; Robin Jia

arXiv:2502.16290·cs.CY·February 25, 2025

Interrogating LLM design under a fair learning doctrine

Johnny Tian-Zheng Wei, Maggie Wang, Ameya Godbole, Jonathan H. Choi,, Robin Jia

PDF

Open Access

TL;DR

This paper introduces a 'fair learning' framework for LLMs, focusing on training decisions' impact on memorization, to better address copyright risks beyond output similarity.

Contribution

It proposes a new legal and technical standard for fair learning in LLMs, emphasizing training process analysis over output comparison.

Findings

01

Deconstructed Pythia LLM using causal and correlational analyses.

02

Connected memorization analysis to a legal standard for fair learning.

03

Suggested evolution of fair learning standards for clarity and rule-based application.

Abstract

The current discourse on large language models (LLMs) and copyright largely takes a "behavioral" perspective, focusing on model outputs and evaluating whether they are substantially similar to training data. However, substantial similarity is difficult to define algorithmically and a narrow focus on model outputs is insufficient to address all copyright risks. In this interdisciplinary work, we take a complementary "structural" perspective and shift our focus to how LLMs are trained. We operationalize a notion of "fair learning" by measuring whether any training decision substantially affected the model's memorization. As a case study, we deconstruct Pythia, an open-source LLM, and demonstrate the use of causal and correlational analyses to make factual determinations about Pythia's training decisions. By proposing a legal standard for fair learning and connecting memorization analyses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInternational Arbitration and Investment Law · European and International Contract Law · Law, Economics, and Judicial Systems

MethodsPythia · Focus