A Watermark for Order-Agnostic Language Models
Ruibo Chen, Yihan Wu, Yanshuo Chen, Chenxi Liu, Junfeng Guo, Heng, Huang

TL;DR
This paper introduces Pattern-mark, a novel watermarking framework tailored for order-agnostic language models, enabling effective detection without sequential token generation, and demonstrates its superior performance through extensive evaluations.
Contribution
Pattern-mark is the first watermarking method specifically designed for order-agnostic LMs, utilizing pattern-based detection and a Markov-chain generator for high efficiency and robustness.
Findings
Enhanced detection efficiency on order-agnostic LMs
Improved robustness against attempts to remove watermarks
Higher quality of generated watermarked outputs
Abstract
Statistical watermarking techniques are well-established for sequentially decoded language models (LMs). However, these techniques cannot be directly applied to order-agnostic LMs, as the tokens in order-agnostic LMs are not generated sequentially. In this work, we introduce Pattern-mark, a pattern-based watermarking framework specifically designed for order-agnostic LMs. We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns. Correspondingly, we propose a statistical pattern-based detection algorithm that recovers the key sequence during detection and conducts statistical tests based on the count of high-frequency patterns. Our extensive evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness, positioning it as a…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper effectively addresses the challenge of watermarking order-agnostic language models (LMs) by introducing a Markov-chain-based key sequence approach that overcomes the limitations inherent in traditional sequential watermarking methods. 2. The inclusion of a dynamic programming algorithm to optimize the detection process by significantly reduces the time complexity, thereby improving the practical feasibility of the proposed approach. 3. The proposed method enhanced detection accuracy
1. The reliance on an alternating key sequence pattern introduces a potential vulnerability, as it may be more easily detected and disrupted by adversaries. Should the specific pattern structure (e.g., alternating keys) be identified, adversaries could develop targeted strategies to either erase or replicate the watermark. Incorporating more complex or adaptive key sequence strategies could enhance the method's robustness against such targeted disruptions. 2. The paper lacks a thorough discussio
1. This paper is well-organized and well-written. 2. The discussion part of the paper provides a good explanation of the motivation for the method.
1. The paper does not detail how the vocabulary set is divided. Splitting the vocabulary will inevitably affect the original probability distribution, resulting in a decrease in output quality. In addition, improper vocabulary segmentation may lead to grammatical errors in the generated sentences, such as incorrectly connecting the verb after the preposition. Is the part of speech considered when dividing the vocabulary? 2. The probability outputs of language models often exhibit high probabilit
+) This paper is well-written and presents its ideas clearly. +)This paper focuses on watermarking order-agnostic LMs, which, to the best of my knowledge, has not been considered in the existing literature. +) This paper proposes an effective strategy to watermark order-agnostic LMs by embedding watermarks within the relationships between adjacent words.
-)I think the protein generation task is not suitable for experiments, as it may not be able to identify important unknown protein architectures.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
