Removing the Watermark Is Not Enough: Forensic Stealth in Generative-AI Watermark Removal
Yevin Nikhel Goonatilake, Giuseppe Ateniese

TL;DR
This paper demonstrates that current AI-generated image watermark removal methods fail to produce outputs indistinguishable from clean images, often leaving detectable signals and highlighting the need for more comprehensive evaluation criteria.
Contribution
The study reveals the limitations of existing watermark removal techniques in achieving forensic stealth and introduces the concept of forensic indistinguishability as a critical evaluation metric.
Findings
Current removers are detectable at over 98% true-positive rate.
Detected signals persist under common post-processing.
A three-way trade-off exists among removal success, image quality, and stealth.
Abstract
Watermarks for AI-generated images are meant to support downstream decisions about provenance, manipulation, and trust. In the settings that motivate watermark removal, therefore, success means more than causing the watermark test to fail. A successful remover must also preserve the utility of the image and make the output forensically indistinguishable from clean content, so that defeating the verifier restores deniability rather than merely replacing one detection signal with another. We show that current watermark removal attacks fail this stronger objective. Across six state-of-the-art removers spanning four attack families, independent forensic detectors distinguish removal-processed outputs from clean images at over 98% true-positive rate under a 1% false-positive budget. Thus, current removers often replace the watermark with a different detectable signal. Using UnMarker (IEEE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
