Nine Ways to Break Copyright Law and Why Our LLM Won't: A Fair Use Aligned Generation Framework

Aakash Sen Sharma; Debdeep Sanyal; Priyansh Srivastava; Sundar Atreya H.; Shirish Karande; Mohan Kankanhalli; Murari Mandal

arXiv:2505.23788·cs.CL·June 2, 2025

Nine Ways to Break Copyright Law and Why Our LLM Won't: A Fair Use Aligned Generation Framework

Aakash Sen Sharma, Debdeep Sanyal, Priyansh Srivastava, Sundar Atreya H., Shirish Karande, Mohan Kankanhalli, Murari Mandal

PDF

Open Access

TL;DR

This paper introduces FUA-LLM, a framework that aligns large language model outputs with fair use principles, reducing copyright infringement risks while maintaining utility through expert-validated data and novel evaluation metrics.

Contribution

We developed a legally-grounded fine-tuning approach using a new dataset and preference optimization to produce copyright-compliant language model outputs.

Findings

01

FUA-LLM reduces problematic outputs by up to 20%.

02

New metrics effectively balance infringement risk and utility.

03

Expert evaluations confirm improved legal compliance.

Abstract

Large language models (LLMs) commonly risk copyright infringement by reproducing protected content verbatim or with insufficient transformative modifications, posing significant ethical, legal, and practical concerns. Current inference-time safeguards predominantly rely on restrictive refusal-based filters, often compromising the practical utility of these models. To address this, we collaborated closely with intellectual property experts to develop FUA-LLM (Fair Use Aligned Language Models), a legally-grounded framework explicitly designed to align LLM outputs with fair-use doctrine. Central to our method is FairUseDB, a carefully constructed dataset containing 18,000 expert-validated examples covering nine realistic infringement scenarios. Leveraging this dataset, we apply Direct Preference Optimization (DPO) to fine-tune open-source LLMs, encouraging them to produce legally compliant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCopyright and Intellectual Property · Intellectual Property Law · Legal Systems and Judicial Processes

MethodsAttentive Walk-Aggregating Graph Neural Network · ALIGN