Adversarial Watermarking Transformer: Towards Tracing Text Provenance   with Data Hiding

Sahar Abdelnabi; Mario Fritz

arXiv:2009.03015·cs.CR·March 30, 2021

Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding

Sahar Abdelnabi, Mario Fritz

PDF

1 Repo

TL;DR

This paper presents the Adversarial Watermarking Transformer, an end-to-end model that encodes watermarks into text to trace provenance while maintaining semantic integrity and resisting attacks.

Contribution

It introduces the first automatic, end-to-end watermarking model for text that learns to hide data without ground truth, enhancing text provenance tracing.

Findings

01

Effective in preserving text utility and semantics

02

Successfully decodes watermarks with high accuracy

03

Robust against various attack strategies

Abstract

Recent advances in natural language generation have introduced powerful language models with high-quality output text. However, this raises concerns about the potential misuse of such models for malicious purposes. In this paper, we study natural language watermarking as a defense to help better mark and trace the provenance of text. We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training that, given an input text and a binary message, generates an output text that is unobtrusively encoded with the given message. We further study different training and inference strategies to achieve minimal changes to the semantics and correctness of the input text. AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations in order to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

S-Abdelnabi/awt
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Dropout · Dense Connections · Attention Is All You Need · Byte Pair Encoding · Label Smoothing · Multi-Head Attention