Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Xinyue Liu; Niloofar Mireshghallah; Jane C. Ginsburg; Tuhin Chakrabarty

arXiv:2603.20957·cs.CL·March 31, 2026

Alignment Whack-a-Mole : Finetuning Activates Verbatim Recall of Copyrighted Books in Large Language Models

Xinyue Liu, Niloofar Mireshghallah, Jane C. Ginsburg, Tuhin Chakrabarty

PDF

1 Repo

TL;DR

Finetuning large language models on specific copyrighted works enables them to reproduce substantial verbatim content, revealing a significant industry-wide vulnerability that challenges assumptions about data privacy and fair use.

Contribution

This study demonstrates that finetuning LLMs on individual copyrighted books reactivates memorized content, bypassing safety measures and exposing risks of copyright infringement.

Findings

01

Models can reproduce up to 90% of held-out copyrighted books.

02

Finetuning on specific authors' works reactivates latent memorization.

03

Industry-wide models memorize copyrighted content similarly.

Abstract

Frontier LLM companies have repeatedly assured courts and regulators that their models do not store copies of training data. They further rely on safety alignment strategies via RLHF, system prompts, and output filters to block verbatim regurgitation of copyrighted works, and have cited the efficacy of these measures in their legal defenses against copyright infringement claims. We show that finetuning bypasses these protections: by training models to expand plot summaries into full text, a task naturally suited for commercial writing assistants, we cause GPT-4o, Gemini-2.5-Pro, and DeepSeek-V3.1 to reproduce up to 85-90% of held-out copyrighted books, with single verbatim spans exceeding 460 words, using only semantic descriptions as prompts and no actual book text. This extraction generalizes across authors: finetuning exclusively on Haruki Murakami's novels unlocks verbatim recall of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cauchy221/Alignment-Whack-a-Mole-Code
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.