Investigating Memorization in Video Diffusion Models
Chen Chen, Enhuai Liu, Daochang Liu, Mubarak Shah, Chang Xu

TL;DR
This paper systematically investigates how video diffusion models memorize training data, introduces new metrics and datasets to measure memorization, and proposes detection strategies to enhance privacy in generative video models.
Contribution
It formally defines content and motion memorization in VDMs, creates new metrics and datasets for assessment, and offers detection methods to improve privacy preservation.
Findings
Memorization is widespread across all tested VDMs.
VDMs can memorize both image and video training data.
Proposed detection strategies effectively identify memorization.
Abstract
Diffusion models, widely used for image and video generation, face a significant limitation: the risk of memorizing and reproducing training data during inference, potentially generating unauthorized copyrighted content. While prior research has focused on image diffusion models (IDMs), video diffusion models (VDMs) remain underexplored. To address this gap, we first formally define the two types of memorization in VDMs (content memorization and motion memorization) in a practical way that focuses on privacy preservation and applies to all generation types. We then introduce new metrics specifically designed to separately assess content and motion memorization in VDMs. Additionally, we curate a dataset of text prompts that are most prone to triggering memorization when used as conditioning in VDMs. By leveraging these prompts, we generate diverse videos from various open-source VDMs,…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The investigation into memorization in video diffusion models (VDMs) addresses a critical and timely concern. As VDMs become increasingly sophisticated and widely used in generating realistic video content, the potential for these models to inadvertently memorize and replicate training data poses significant legal and ethical risks. 2. The paper enhances the conceptual framework and analytical tools available for studying memorization in VDMs by introducing more nuanced and practical definit
1. The paper introduces a new metric for evaluating content memorization in video diffusion models in a frame-level manner but does not sufficiently justify the need for these over existing methods developed for image diffusion models. It remains unclear whether the current metrics for IDMs are insufficient for assessing memorization in videos. 2. Limited Validation: The new metrics' validation primarily relies on 1K manual annotations on a single dataset (WebVid-10M), which might not reflect t
1. This paper provides valuable insights by identifying several shortcomings in prior studies investigating the memorization of video diffusion models. 2. It proposes measuring frame-level memorization in video diffusion models, extending beyond the video-level memorization focus of previous work. 3. This paper also measures memorization from the image training dataset, which is beneficial.
This paper points out several problems of prior study in investigating the memorization problem of video diffusion models, which are valuable. However, there are the following concerns that make the paper less solid and convincing. 1. It is doubtful that content and motion memorization can be disentangled as in this paper. This paper uses optical flow to measure motion memorization, as indicated in Section 2.2. However, optical flow is also influenced by the content, as shown in examples with
1. This paper redefines content and motion memorization, achieving better alignment with human perception. 2. It proposes remedies for detecting content and motion memorization, which are crucial for mitigating privacy risks. 3. The experimental results are comprehensive and reliable.
1. The writing needs improvement. In my opinion, the paper reads more like an experimental report. The contribution section is overly long, and there are frequent direct comparisons with previous work throughout the paper. These comparisons are often presented in bullet points, with each point being lengthy. Additionally, in Table 1, what do the red text and bold font represent? This is not explained in the paper. 2. The paper lacks coherent expression, making it difficult to read smoothly. Each
NA
I have some concerns: 1. The proposed GSSCD is just a naive extension of SSCD. For video copy detection (an area you may know) research, this aggregation (GSSCD) is very common. Please see the recent VSCD competition by Meta AI. In addition, SSCD is not a good metric itself for copy detection in the background of the diffusion model (Do you know why?). You should design something to substitute SSCD instead. 2. The solutions and evaluations are also naive :) I want to see some insightful methods.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology
MethodsDiffusion
