WaterDrum: Watermarking for Data-centric Unlearning Metric

Xinyang Lu; Xinyuan Niu; Gregory Kang Ruey Lau; Bui Thi Cam Nhung; Rachael Hwee Ling Sim; John Russell Himawan; Fanyu Wen; Chuan-Sheng Foo; See-Kiong Ng; Bryan Kian Hsiang Low

arXiv:2505.05064·cs.LG·February 3, 2026

WaterDrum: Watermarking for Data-centric Unlearning Metric

Xinyang Lu, Xinyuan Niu, Gregory Kang Ruey Lau, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, John Russell Himawan, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

PDF

Open Access 1 Repo 2 Datasets

TL;DR

WaterDrum introduces a novel data-centric watermarking metric for evaluating large language model unlearning, addressing limitations of utility-based metrics especially with similar data and practical retraining constraints.

Contribution

The paper presents WaterDrum, the first watermarking-based unlearning metric for LLMs, along with new benchmark datasets for rigorous evaluation of unlearning methods.

Findings

01

WaterDrum effectively measures unlearning in realistic scenarios.

02

New benchmark datasets enable comprehensive evaluation.

03

WaterDrum outperforms utility-based metrics in certain settings.

Abstract

Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. Existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when the forget and retain sets have semantically similar content and/or retraining the model from scratch on the retain set is impractical. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking to overcome these limitations. We introduce new benchmark datasets (with different levels of data similarity) for LLM unlearning that can be used to rigorously evaluate unlearning algorithms via WaterDrum. Our code is available at https://github.com/lululu008/WaterDrum and our new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lululu008/waterdrum
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks

MethodsSparse Evolutionary Training