Investigating Memorization in Video Diffusion Models

Chen Chen; Enhuai Liu; Daochang Liu; Mubarak Shah; Chang Xu

arXiv:2410.21669·cs.CV·April 28, 2025

Investigating Memorization in Video Diffusion Models

Chen Chen, Enhuai Liu, Daochang Liu, Mubarak Shah, Chang Xu

PDF

Open Access 4 Reviews

TL;DR

This paper systematically investigates how video diffusion models memorize training data, introduces new metrics and datasets to measure memorization, and proposes detection strategies to enhance privacy in generative video models.

Contribution

It formally defines content and motion memorization in VDMs, creates new metrics and datasets for assessment, and offers detection methods to improve privacy preservation.

Findings

01

Memorization is widespread across all tested VDMs.

02

VDMs can memorize both image and video training data.

03

Proposed detection strategies effectively identify memorization.

Abstract

Diffusion models, widely used for image and video generation, face a significant limitation: the risk of memorizing and reproducing training data during inference, potentially generating unauthorized copyrighted content. While prior research has focused on image diffusion models (IDMs), video diffusion models (VDMs) remain underexplored. To address this gap, we first formally define the two types of memorization in VDMs (content memorization and motion memorization) in a practical way that focuses on privacy preservation and applies to all generation types. We then introduce new metrics specifically designed to separately assess content and motion memorization in VDMs. Additionally, we curate a dataset of text prompts that are most prone to triggering memorization when used as conditioning in VDMs. By leveraging these prompts, we generate diverse videos from various open-source VDMs,…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 2

Strengths

1. The investigation into memorization in video diffusion models (VDMs) addresses a critical and timely concern. As VDMs become increasingly sophisticated and widely used in generating realistic video content, the potential for these models to inadvertently memorize and replicate training data poses significant legal and ethical risks. 2. The paper enhances the conceptual framework and analytical tools available for studying memorization in VDMs by introducing more nuanced and practical definit

Weaknesses

1. The paper introduces a new metric for evaluating content memorization in video diffusion models in a frame-level manner but does not sufficiently justify the need for these over existing methods developed for image diffusion models. It remains unclear whether the current metrics for IDMs are insufficient for assessing memorization in videos. 2. Limited Validation: The new metrics' validation primarily relies on 1K manual annotations on a single dataset (WebVid-10M), which might not reflect t

Reviewer 02Rating 5Confidence 3

Strengths

1. This paper provides valuable insights by identifying several shortcomings in prior studies investigating the memorization of video diffusion models. 2. It proposes measuring frame-level memorization in video diffusion models, extending beyond the video-level memorization focus of previous work. 3. This paper also measures memorization from the image training dataset, which is beneficial.

Weaknesses

This paper points out several problems of prior study in investigating the memorization problem of video diffusion models, which are valuable. However, there are the following concerns that make the paper less solid and convincing. 1. It is doubtful that content and motion memorization can be disentangled as in this paper. This paper uses optical flow to measure motion memorization, as indicated in Section 2.2. However, optical flow is also influenced by the content, as shown in examples with

Reviewer 03Rating 5Confidence 4

Strengths

1. This paper redefines content and motion memorization, achieving better alignment with human perception. 2. It proposes remedies for detecting content and motion memorization, which are crucial for mitigating privacy risks. 3. The experimental results are comprehensive and reliable.

Weaknesses

1. The writing needs improvement. In my opinion, the paper reads more like an experimental report. The contribution section is overly long, and there are frequent direct comparisons with previous work throughout the paper. These comparisons are often presented in bullet points, with each point being lengthy. Additionally, in Table 1, what do the red text and bold font represent? This is not explained in the paper. 2. The paper lacks coherent expression, making it difficult to read smoothly. Each

Reviewer 04Rating 3Confidence 5

Strengths

NA

Weaknesses

I have some concerns: 1. The proposed GSSCD is just a naive extension of SSCD. For video copy detection (an area you may know) research, this aggregation (GSSCD) is very common. Please see the recent VSCD competition by Meta AI. In addition, SSCD is not a good metric itself for copy detection in the background of the diffusion model (Do you know why?). You should design something to substitute SSCD instead. 2. The solutions and evaluations are also naive :) I want to see some insightful methods.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology

MethodsDiffusion