On the Robustness of Watermarking for Autoregressive Image Generation
Andreas M\"uller, Denis Lukovnikov, Shingo Kodama, Minh Pham, Anubhav Jain, Jonathan Petit, Niv Cohen, Asja Fischer

TL;DR
This paper evaluates watermarking techniques for autoregressive image generators, revealing their vulnerabilities to removal and forgery attacks, and highlighting their unreliability for detecting synthetic content.
Contribution
The study introduces three new attacks on watermarking schemes and demonstrates their effectiveness, exposing weaknesses in current watermarking methods for AR image generation.
Findings
Watermark removal and forgery attacks are effective with minimal access.
Existing watermarking schemes often fail to reliably detect synthetic images.
Watermark mimicry can cause false positives, hindering dataset filtering.
Abstract
The proliferation of autoregressive (AR) image generators demands reliable detection and attribution of their outputs to mitigate misinformation, and to filter synthetic images from training data to prevent model collapse. To address this need, watermarking techniques, specifically designed for AR models, embed a subtle signal at generation time, enabling downstream verification through a corresponding watermark detector. In this work, we study these schemes and demonstrate their vulnerability to both watermark removal and forgery attacks. We assess existing attacks and further introduce three new attacks: (i) a vector-quantized regeneration removal attack, (ii) adversarial optimization-based attack, and (iii) a frequency injection attack. Our evaluation reveals that removal and forgery attacks can be effective with access to a single watermarked reference image and without access to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
