Tracing Text Provenance via Context-Aware Lexical Substitution
Xi Yang, Jie Zhang, Kejiang Chen, Weiming Zhang, Zehua Ma, Feng Wang,, Nenghai Yu

TL;DR
This paper introduces a novel context-aware lexical substitution method using BERT for natural language watermarking, improving semantic preservation and transferability over existing techniques.
Contribution
The paper proposes a BERT-based lexical substitution approach for text watermarking that maintains semantic integrity and enhances transferability compared to prior methods.
Findings
Outperforms existing watermarking methods in preserving sentence meaning.
Achieves higher transferability across different text types.
Outperforms state-of-the-art on Stanford Word Substitution Benchmark.
Abstract
Text content created by humans or language models is often stolen or misused by adversaries. Tracing text provenance can help claim the ownership of text content or identify the malicious users who distribute misleading content like machine-generated fake news. There have been some attempts to achieve this, mainly based on watermarking techniques. Specifically, traditional text watermarking methods embed watermarks by slightly altering text format like line spacing and font, which, however, are fragile to cross-media transmissions like OCR. Considering this, natural language watermarking methods represent watermarks by replacing words in original sentences with synonyms from handcrafted lexical resources (e.g., WordNet), but they do not consider the substitution's impact on the overall sentence's meaning. Recently, a transformer-based network was proposed to embed watermarks by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Advanced Malware Detection Techniques · Digital Media Forensic Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Attention Dropout · Dense Connections · Linear Warmup With Linear Decay · Residual Connection · Layer Normalization · Weight Decay
