Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation
Zhongjie Ba, Yitao Zhang, Peng Cheng, Bin Gong, Xinyu Zhang, Qinglong, Wang, Kui Ren

TL;DR
This paper reveals a fundamental tradeoff in watermarking for AI content, showing that robustness against distortions can lead to information leakage, and introduces an attack exploiting this leakage to evade detection and forge content.
Contribution
It uncovers the robustness-stealthiness paradox in watermarking and proposes a channel-aware feature extraction attack that bypasses prior limitations.
Findings
Achieves 60% higher success in detection evasion.
Improves forgery accuracy by 51%.
Maintains visual fidelity of watermarked images.
Abstract
Watermarking plays a key role in the provenance and detection of AI-generated content. While existing methods prioritize robustness against real-world distortions (e.g., JPEG compression and noise addition), we reveal a fundamental tradeoff: such robust watermarks inherently improve the redundancy of detectable patterns encoded into images, creating exploitable information leakage. To leverage this, we propose an attack framework that extracts leakage of watermark patterns through multi-channel feature learning using a pre-trained vision model. Unlike prior works requiring massive data or detector access, our method achieves both forgery and detection evasion with a single watermarked image. Extensive experiments demonstrate that our method achieves a 60\% success rate gain in detection evasion and 51\% improvement in forgery accuracy compared to state-of-the-art methods while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Digital Media Forensic Detection · Handwritten Text Recognition Techniques
