TL;DR
This paper presents the Adversarial Watermarking Transformer, an end-to-end model that encodes watermarks into text to trace provenance while maintaining semantic integrity and resisting attacks.
Contribution
It introduces the first automatic, end-to-end watermarking model for text that learns to hide data without ground truth, enhancing text provenance tracing.
Findings
Effective in preserving text utility and semantics
Successfully decodes watermarks with high accuracy
Robust against various attack strategies
Abstract
Recent advances in natural language generation have introduced powerful language models with high-quality output text. However, this raises concerns about the potential misuse of such models for malicious purposes. In this paper, we study natural language watermarking as a defense to help better mark and trace the provenance of text. We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training that, given an input text and a binary message, generates an output text that is unobtrusively encoded with the given message. We further study different training and inference strategies to achieve minimal changes to the semantics and correctness of the input text. AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations in order to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dropout · Dense Connections · Attention Is All You Need · Byte Pair Encoding · Label Smoothing · Multi-Head Attention
