Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He, Yangsibo Huang, Weijia Shi, Tinghao Xie, Haotian Liu, Yue, Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson

TL;DR
This paper evaluates how current image and video generation models can unintentionally produce copyrighted characters, assesses the effectiveness of mitigation strategies, and proposes new evaluation methods to improve copyright safeguards.
Contribution
It introduces a novel evaluation framework for assessing copyright infringement risks and mitigation strategies in generative models, revealing limitations of existing approaches.
Findings
Models can generate copyrighted characters without explicit prompts
Prompt rewriting alone is insufficient to prevent copyright infringement
Negative prompting improves mitigation effectiveness
Abstract
Recent studies show that image and video generation models can be prompted to reproduce copyrighted content from their training data, raising serious legal concerns about copyright infringement. Copyrighted characters (e.g., Mario, Batman) present a significant challenge: at least one lawsuit has already awarded damages based on the generation of such characters. Consequently, commercial services like DALL-E have started deploying interventions. However, little research has systematically examined these problems: (1) Can users easily prompt models to generate copyrighted characters, even if it is unintentional?; (2) How effective are the existing mitigation strategies? To address these questions, we introduce a novel evaluation framework with metrics that assess both the generated image's similarity to copyrighted characters and its consistency with user intent, grounded in a set of…
Peer Reviews
Decision·ICLR 2025 Poster
The paper addresses a very relevant topic in a way that goes way beyond standard approaches provided with image generation systems, such as default Prompt rewriting. The research questions (Q1/Q2) are clearly formulated, and the empirical metrics developed are well-aligned to these questions. Consistency via the VQAS score is an important addition to the method. Despite its empirical stance, the paper is rather rigorous on the technical aspects, which includes an extensive choice of image gener
Poor document structure, with some important elements appearing in the additional material section, and no clear, systematic progression, sometimes leaving an impression of separate, quasi-anecdotal testing. While this may not affect the quality or significance of individual results, it makes the paper tedious to read at times and weakens the overall use case. Aside from the quantitative reporting (DETECT, CONS) there are a number of evaluation issues: - lack of a discussion on whether the test
The paper has several key strengths: ## 1. Writing Quality and Clarity The paper is remarkably concise, clear, and articulate in its writing. Despite aligning to terminologies consistent with our area of research, I imagine that the manuscript's writing style and depth would allow many types of stakeholders beyond ICLR's more technical community (e.g., software engineers with limited machine learning experience) to engage with it. ## 2. Evaluation Depth and Methodological Rigor The paper's pr
My concerns with the paper are primarily related to its positioning and contribution with respect to prior work in related areas. Below, I highlight three key concerns: ## 1. Weak Connection to Jailbreaking and Red Teaming The paper is motivated by two key questions that related to the generation of copyrighted characters and mitigation strategies against attempts at such generation. I find this conceptually identically to the challenge of jailbreaking, yet the authors appear to have explicitly
Although the ability of image-generation engines to create, by instruction or indirectly, is known, it is important to measure how common the problem is and how susceptible they are to produce them even without user intent. It is also important to measure how effective mitigation strategies are. In many ways, the authors aim to have a complete discussion about the subject, which is highly positive. I also liked that the authors included imagery in the paper, it is very important in this context
I see two key problems with this paper which, for me, warrant it not to be accepted. First, the authors use GPT-4V as the evaluator of their images. According to the appendix, the accuracy is only 82.5%, and the Kappa agreement with humans is merely 0.65. There are also problems with the methodology used in the human evaluation, but even dis-considering this problem, this characterizes as a quite flawed metric: it incorrectly evaluates 1 in 5 of images. This is never mentioned in the main paper
Code & Models
Videos
Taxonomy
TopicsLaw, AI, and Intellectual Property
MethodsSparse Evolutionary Training
