SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models
Amirhossein Dabiriaghdam, Lele Wang

TL;DR
SimMark is a novel sentence-level watermarking method for large language models that embeds detectable patterns using semantic similarity and rejection sampling, ensuring robustness against paraphrasing without needing model internals.
Contribution
It introduces a robust, model-agnostic watermarking algorithm that outperforms previous techniques in robustness, efficiency, and domain applicability.
Findings
Achieves high robustness against paraphrasing attacks
Maintains high text quality and fluency
Surpasses prior watermarking methods in benchmarks
Abstract
The widespread adoption of large language models (LLMs) necessitates reliable methods to detect LLM-generated text. We introduce SimMark, a robust sentence-level watermarking algorithm that makes LLMs' outputs traceable without requiring access to model internals, making it compatible with both open and API-based LLMs. By leveraging the similarity of semantic sentence embeddings combined with rejection sampling to embed detectable statistical patterns imperceptible to humans, and employing a soft counting mechanism, SimMark achieves robustness against paraphrasing attacks. Experimental results demonstrate that SimMark sets a new benchmark for robust watermarking of LLM-generated content, surpassing prior sentence-level watermarking techniques in robustness, sampling efficiency, and applicability across diverse domains, all while maintaining the text quality and fluency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Internet Traffic Analysis and Secure E-voting · Vehicle License Plate Recognition
