TL;DR
This paper proposes In-Context Watermarking (ICW), a prompt-based method for embedding watermarks into LLM-generated text without needing access to the decoding process, enhancing practicality and scalability.
Contribution
The paper introduces ICW, a novel prompt engineering approach for watermarking LLM outputs, applicable in real-world scenarios without model access, and demonstrates its effectiveness through experiments.
Findings
ICW is feasible as a model-agnostic watermarking technique.
Different ICW strategies can be paired with tailored detection methods.
ICW shows promise for scalable content attribution as LLMs advance.
Abstract
The growing use of large language models (LLMs) for sensitive applications has highlighted the need for effective watermarking techniques to ensure the provenance and accountability of AI-generated text. However, most existing watermarking methods require access to the decoding process, limiting their applicability in real-world settings. One illustrative example is the use of LLMs by dishonest reviewers in the context of academic peer review, where conference organizers have no access to the model used but still need to detect AI-generated reviews. Motivated by this gap, we introduce In-Context Watermarking (ICW), which embeds watermarks into generated text solely through prompt engineering, leveraging LLMs' in-context learning and instruction-following abilities. We investigate four ICW strategies at different levels of granularity, each paired with a tailored detection method. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
