TL;DR
This paper critically examines provable copyright protection in generative models, revealing limitations of existing notions like NAF, and introduces a new framework called clean-room copyright protection that offers meaningful guarantees.
Contribution
It establishes that NAF alone is insufficient for copyright protection, and proposes a formal framework for clean-room copyright protection with theoretical guarantees.
Findings
NAF models can enable verbatim copying, failing copyright protection.
Clean-room copyright protection provides risk control for copying.
Differential privacy implies clean-room copyright protection under certain conditions.
Abstract
Are there any conditions under which a generative model's outputs are guaranteed not to infringe the copyrights of its training data? This is the question of "provable copyright protection" first posed by Vyas, Kakade, and Barak (ICML 2023). They define near access-freeness (NAF) and propose it as sufficient for protection. This paper revisits the question and establishes new foundations for provable copyright protection -- foundations that are firmer both technically and legally. First, we show that NAF alone does not prevent infringement. In fact, NAF models can enable verbatim copying, a blatant failure of copyright protection that we dub being tainted. Then, we introduce our blameless copyright protection framework for defining meaningful guarantees, and instantiate it with clean-room copyright protection. Clean-room copyright protection allows a user to control their risk of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
