Strong Copyright Protection for Language Models via Adaptive Model   Fusion

Javier Abad; Konstantin Donhauser; Francesco Pinto; Fanny Yang

arXiv:2407.20105·cs.LG·July 30, 2024

Strong Copyright Protection for Language Models via Adaptive Model Fusion

Javier Abad, Konstantin Donhauser, Francesco Pinto, Fanny Yang

PDF

Open Access

TL;DR

This paper introduces CP-Fuse, an adaptive model fusion technique that effectively reduces copyright infringement in language models while preserving high-quality output.

Contribution

The paper presents CP-Fuse, a novel adaptive fusion algorithm inspired by NAF, that minimizes reproduction of copyrighted material in language models.

Findings

01

CP-Fuse significantly reduces memorization of copyrighted content.

02

CP-Fuse maintains high-quality text and code generation.

03

CP-Fuse can be integrated with other protection techniques.

Abstract

The risk of language models unintentionally reproducing copyrighted material from their training data has led to the development of various protective measures. In this paper, we propose model fusion as an effective solution to safeguard against copyright infringement. In particular, we introduce Copyright-Protecting Fusion (CP-Fuse), an algorithm that adaptively combines language models to minimize the reproduction of protected materials. CP-Fuse is inspired by the recently proposed Near-Access Free (NAF) framework and additionally incorporates a desirable balancing property that we demonstrate prevents the reproduction of memorized training data. Our results show that CP-Fuse significantly reduces the memorization of copyrighted content while maintaining high-quality text and code generation. Furthermore, we demonstrate how CP-Fuse can be integrated with other techniques for enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques