Generating Watermarked Adversarial Texts
Mingjie Li, Hanzhou Wu, Xinpeng Zhang

TL;DR
This paper introduces a framework for creating watermarked adversarial texts that can deceive neural networks while embedding a watermark for ownership verification, even after further attacks.
Contribution
The paper proposes a novel method to generate watermarked adversarial texts that maintain effectiveness and watermark integrity against subsequent adversarial attacks.
Findings
Successfully fools advanced DNN models
Watermark remains intact after additional adversarial attacks
Watermarked texts have high semantic quality
Abstract
Adversarial example generation has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples, which reveals the vulnerability of DNNs, motivating us to find good solutions to improve the robustness of DNN models. Due to the extensiveness and high liquidity of natural language over the social networks, various natural language based adversarial attack algorithms have been proposed in the literature. These algorithms generate adversarial text examples with high semantic quality. However, the generated adversarial text examples may be maliciously or illegally used. In order to tackle with this problem, we present a general framework for generating watermarked adversarial text examples. For each word in a given text, a set of candidate words are determined to ensure that all the words in the set can be used to either…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
