ClusterMark: Towards Robust Watermarking for Autoregressive Image Generators with Visual Token Clustering

Denis Lukovnikov; Andreas M\"uller; Erwin Quiring; Asja Fischer

arXiv:2508.06656·cs.CV·April 13, 2026

ClusterMark: Towards Robust Watermarking for Autoregressive Image Generators with Visual Token Clustering

Denis Lukovnikov, Andreas M\"uller, Erwin Quiring, Asja Fischer

PDF

TL;DR

This paper introduces ClusterMark, a robust watermarking method for autoregressive image models using visual token clustering, enhancing robustness against perturbations and attacks while maintaining image quality.

Contribution

Proposes a novel token clustering-based watermarking scheme for autoregressive image models, improving robustness and verification speed compared to existing methods.

Findings

01

ClusterMark significantly outperforms baselines in robustness against perturbations.

02

Token clustering maintains high image quality and fast verification.

03

Method is effective in both training-free and fine-tuned settings.

Abstract

In-generation watermarking for latent diffusion models has recently shown high robustness in marking generated images for easier detection and attribution. However, its application to autoregressive (AR) image models is underexplored. Autoregressive models generate images by autoregressively predicting a sequence of visual tokens that are then decoded into pixels using a VQ-VAE decoder. Inspired by KGW watermarking for large language models, we examine token-level watermarking schemes that bias the next-token prediction based on prior tokens. We find that a direct transfer of these schemes works in principle, but the detectability of the watermarks decreases considerably under common image perturbations. As a remedy, we propose a watermarking approach based on visual token clustering, which assigns similar tokens to the same set (red or green). We investigate token clustering in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.