Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach
Xincheng Wang, Hanchi Sun, Wenjun Sun, Kejun Xue, Wangqiu Zhou, Jianbo Zhang, Wei Sun, Dandan Zhu, Xiongkuo Min, Jun Jia, Zhijun Fang

TL;DR
This paper presents a comprehensive benchmark for dataset watermarking in diffusion models, evaluating existing methods' effectiveness and robustness, and introduces a practical removal technique revealing vulnerabilities in current watermarking approaches.
Contribution
It establishes a unified evaluation framework for dataset watermarking in diffusion models and proposes a watermark removal method to identify current limitations.
Findings
Existing watermarking methods perform well in universality and transmissibility.
Current methods show limited robustness against real-world attacks.
A practical removal technique can fully eliminate watermarks without affecting model fine-tuning.
Abstract
Recent fine-tuning techniques for diffusion models enable them to reproduce specific image sets, such as particular faces or artistic styles, but also introduce copyright and security risks. Dataset watermarking has been proposed to ensure traceability by embedding imperceptible watermarks into training images, which remain detectable in outputs even after fine-tuning. However, current methods lack a unified evaluation framework. To address this, this paper establishes a general threat model and introduces a comprehensive evaluation framework encompassing Universality, Transmissibility, and Robustness. Experiments show that existing methods perform well in universality and transmissibility, and exhibit some robustness against common image processing operations, yet still fall short under real-world threat scenarios. To reveal these vulnerabilities, the paper further proposes a practical…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- Proposes a standardized evaluation framework (Universality/Transmissibility/Robustness) addressing critical lack of comparable metrics in dataset watermarking research. - Through comprehensive experiments, demonstrates that while existing methods resist common distortions, they remain highly vulnerable to watermark removal attacks - exposing a crucial real-world weakness. - Introduces DeAttack, a practical watermark removal framework that successfully breaks current methods, setting higher r
- Insufficient Motivation and Context: The paper does not clearly convey the serious security and ethical concerns associated with fine-tuning diffusion models, making it difficult for readers to grasp the significance of the problem. - Lack of Detail on Threat Model Definitions: The discussion of varying definitions of threat models in existing dataset watermarking approaches is vague. More elaboration is needed to clarify how these differences motivate the current research. - Inadequate Revi
- The topic of this paper, i.e., tracing unauthorized fine-tuning or dataset usage in diffusion models, is important and relevant. - The overall evaluation design appears reasonable, though this is mainly because it adheres closely to what previous works have already done.
- The claimed "unified evaluation framework" lacks sufficient detail and support. The authors repeatedly claim to evaluate all watermarking methods "under a unified threat model" and "in a fair manner", yet provide no experimental protocol or hyperparameter details on how the baselines are trained and evaluated. As far as I know, the baselines themselves are fundamentally different in nature. DIAGNOSIS and SIREN use a hypothesis testing mechanism (binary or one-class classification results on an
Unified threat model and evaluation framework: The paper is the first to establish a unified threat model for dataset watermarking in diffusion models and proposes a three-dimensional evaluation framework (universality, transmissibility, robustness). It clearly defines the essential requirements that watermarking should meet under realistic scenarios and provides a theoretical foundation for future research. Comprehensive benchmark construction: Based on multiple datasets and fine-tuning method
Limited methodological novelty: Although DeAttack is presented as a unified framework for watermark removal, its technical innovation is limited. The framework largely combines existing degradation operations (blur, JPEG compression, additive noise) with common restoration models (VAE, diffusion, SwinIR) without introducing new algorithmic mechanisms or theoretical insights. The “representation-space noise” component remains conceptual and is not implemented in experiments. Overall, DeAttack is
1. In general, the writing of the paper is relatively clear and easy to follow. 2. The paper focuses on a meaningful and interesting topic: the systematic evaluation of current dataset watermarking techniques for copyright protection against the fine-tuning of diffusion models, and indicates three important aspects for evaluation: universality, transmissibility, and robustness.
1. Some discussion may not be comprehensive. For example, when applying the attack to the watermarking methods, the authors consider only applying the attack to the image used for fine-tuning. However, the attack may also be applied to the generated images after fine-tuning, or applied in both stages. In addition, there could also be a scenario where the data used for fine-tuning a diffusion model consists of watermarked data from different users, where the watermarking is done by the same metho
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Advanced Steganography and Watermarking Techniques
