Security and Detectability Analysis of Unicode Text Watermarking Methods Against Large Language Models
Malte Hellmeier

TL;DR
This study evaluates the security and detectability of ten Unicode text watermarking methods against six large language models, revealing that advanced models can detect watermarks but cannot extract them without source code access.
Contribution
It provides a comprehensive analysis of existing Unicode text watermarking methods' robustness against modern large language models, highlighting vulnerabilities and security implications.
Findings
Large language models can detect watermarked text.
Watermarks cannot be extracted without source code.
Latest models show increased detection capabilities.
Abstract
Securing digital text is becoming increasingly relevant due to the widespread use of large language models. Individuals' fear of losing control over data when it is being used to train such machine learning models or when distinguishing model-generated output from text written by humans. Digital watermarking provides additional protection by embedding an invisible watermark within the data that requires protection. However, little work has been taken to analyze and verify if existing digital text watermarking methods are secure and undetectable by large language models. In this paper, we investigate the security-related area of watermarking and machine learning models for text data. In a controlled testbed of three experiments, ten existing Unicode text watermarking methods were implemented and analyzed across six large language models: GPT-5, GPT-4o, Teuken 7B, Llama 3.3, Claude Sonnet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUser Authentication and Security Systems · Advanced Malware Detection Techniques · Advanced Steganography and Watermarking Techniques
