On Provable Copyright Protection for Generative Models

Nikhil Vyas; Sham Kakade; Boaz Barak

arXiv:2302.10870·cs.LG·July 24, 2023·21 cites

On Provable Copyright Protection for Generative Models

Nikhil Vyas, Sham Kakade, Boaz Barak

PDF

Open Access

TL;DR

This paper introduces a formal framework called near access-freeness (NAF) to quantify and bound the probability that generative models produce outputs similar to copyrighted data, and proposes algorithms to enforce these bounds with minimal impact on output quality.

Contribution

It defines NAF, proves bounds on copyright-protected output probabilities, and develops algorithms to modify models for stronger copyright protections.

Findings

01

Models satisfying NAF have low probability of reproducing copyrighted data.

02

Proposed algorithms effectively reduce protected content sampling with minimal quality loss.

03

Experiments on language and image models demonstrate practical applicability and minimal degradation.

Abstract

There is a growing concern that learned conditional generative models may output samples that are substantially similar to some copyrighted data $C$ that was in their training set. We give a formal definition of $near access-freeness (NAF)$ and prove bounds on the probability that a model satisfying this definition outputs a sample similar to $C$ , even if $C$ is included in its training set. Roughly speaking, a generative model $p$ is $\textit{$ k $-NAF}$ if for every potentially copyrighted data $C$ , the output of $p$ diverges by at most $k$ -bits from the output of a model $q$ that $\textit{did not access$ C $at all}$ . We also give generative model learning algorithms, which efficiently modify the original generative model learning algorithm in a black box manner, that output generative models with strong bounds on the probability of sampling protected content. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing