Training-Free Watermarking for Autoregressive Image Generation

Yu Tong; Zihao Pan; Shuai Yang; Kaiyang Zhou

arXiv:2505.14673·cs.CV·May 21, 2025

Training-Free Watermarking for Autoregressive Image Generation

Yu Tong, Zihao Pan, Shuai Yang, Kaiyang Zhou

PDF

Open Access 1 Repo 1 Models 4 Reviews

TL;DR

IndexMark is a training-free watermarking method for autoregressive image models that embeds watermarks by replacing tokens with similar ones, ensuring high image quality, verification accuracy, and robustness against common attacks.

Contribution

We introduce IndexMark, a novel training-free watermarking framework specifically designed for autoregressive image generation models, leveraging codebook redundancy for effective embedding.

Findings

01

Achieves state-of-the-art image quality and verification accuracy.

02

Demonstrates robustness against cropping, noise, blur, and compression.

03

Operates without additional training or model modification.

Abstract

Invisible image watermarking can protect image ownership and prevent malicious misuse of visual generative models. However, existing generative watermarking methods are mainly designed for diffusion models while watermarking for autoregressive image generation models remains largely underexplored. We propose IndexMark, a training-free watermarking framework for autoregressive image generation models. IndexMark is inspired by the redundancy property of the codebook: replacing autoregressively generated indices with similar indices produces negligible visual differences. The core component in IndexMark is a simple yet effective match-then-replace method, which carefully selects watermark tokens from the codebook based on token similarity, and promotes the use of watermark tokens through token replacement, thereby embedding the watermark without affecting the image quality. Watermark…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 4

Strengths

Originality: The paper addresses a niche yet crucial gap in the field of image watermarking by focusing on autoregressive image generation models. This is a relatively underexplored area compared to watermarking techniques for diffusion models. The idea of using the redundancy in the codebook for embedding watermarks through a match-then-replace strategy is novel and provides an elegant solution to the imperceptibility problem. Quality: The framework is well-constructed, with clear explanations

Weaknesses

Verification Performance at Varying Watermark Strengths: The paper briefly mentions watermark robustness under different conditions but lacks detailed analysis of verification performance across different watermark strengths. Specifically, the effect of varying watermark strength on verification accuracy (e.g., false positives/negatives) is not sufficiently explored. It would be beneficial to evaluate the performance at lower and higher confidence thresholds and under different attack types to b

Reviewer 02Rating 4Confidence 5

Strengths

S1. Use case: The paper addresses the important and timely problem of watermarking autoregressive image models, which is a less-explored area than watermarking diffusion models. S2. Clear Writing: The paper is well-written, and the proposed method is explained clearly. S3. Strong core concepts: The method is built on severa ideas that I found very good: (a) leveraging codebook redundancy, (b) formalizing the index pairing as a maximum weight perfect matching problem to maximize intra-pair simi

Weaknesses

W1. Statistical Test: The watermark verification method relies on the CLT to approximate the distribution of the green index rate. Since the statistic follows a binomial distribution, for which an exact test or a more accurate confidence interval (e.g., see https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper%E2%80%93Pearson_interval) could be used instead of a normal approximation. W2. Weak post-hoc baselines: The post-hoc baselines (DwtDct, DwtDctSvd, RivaGAN) are no

Reviewer 03Rating 6Confidence 3

Strengths

* Natural idea to use LLM red-green watermarking scheme for autoregressive images. * Using the perfect matching algorithm as a subroutine to find "nearby" pairs is quite nice. * The method performs well experimentally. (Although the number of existing watermarking schemes for *autoregressively* generated images is low.)

Weaknesses

I like the paper a lot overall. There are questions below that I would like answered. But, even if all my questions are addressed, I'm worried about the technical contribution of this work. While well executed, the idea is to just (somewhat cleverly) apply red-green list watermarking to autoregressive images. Another concern is that I'm not sure autoregressive image watermarking should be considered so differently from diffusion watermarking. It seems plausible to take a diffusion generated ima

Reviewer 04Rating 2Confidence 4

Strengths

- The methodological design is effective. Through the red-green index matching of codebook and the index replacement mechanism based on confidence guidance, the invisible embedding of watermark is realized. - Adequate experiments and analysis demonstrate the effectiveness of the proposed method in terms of watermark verification accuracy.

Weaknesses

There are some issues that need to be addressed: - **Dependence of index encoder.** As shown in fig. 6(b), IndexMark strongly depends on the performance of index encoder. It is necessary to retrain such an encoder for each different VAR model, which is not the training-free watermarking scheme mentioned in the article title. And if there are 512×512 or 1024×1024 images and a larger number of codebook items, it is not sure whether the index encoder can be trained well. The author should provide o

Code & Models

Repositories

maifoundations/indexmark
pytorchOfficial

Models

🤗
maifoundations/IndexMark
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Steganography and Watermarking Techniques · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection

MethodsDiffusion