The Feasibility of Topic-Based Watermarking on Academic Peer Reviews
Alexander Nemecek, Yuzhou Jiang, Erman Ayday

TL;DR
This paper evaluates topic-based watermarking as a method to reliably attribute LLM-generated peer reviews without compromising review quality, demonstrating its robustness and practicality across various LLM configurations.
Contribution
It introduces and systematically assesses topic-based watermarking for peer review text, showing its effectiveness and robustness in real academic review scenarios.
Findings
TBW maintains review quality similar to non-watermarked text
TBW detection remains robust under paraphrasing
The method is effective across multiple LLM configurations
Abstract
Large language models (LLMs) are increasingly integrated into academic workflows, with many conferences and journals permitting their use for tasks such as language refinement and literature summarization. However, their use in peer review remains prohibited due to concerns around confidentiality breaches, hallucinated content, and inconsistent evaluations. As LLM-generated text becomes more indistinguishable from human writing, there is a growing need for reliable attribution mechanisms to preserve the integrity of the review process. In this work, we evaluate topic-based watermarking (TBW), a semantic-aware technique designed to embed detectable signals into LLM-generated text. We conduct a systematic assessment across multiple LLM configurations, including base, few-shot, and fine-tuned variants, using authentic peer review data from academic conferences. Our results show that TBW…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Adversarial Robustness in Machine Learning · Spam and Phishing Detection
