Generating Watermarked Adversarial Texts

Mingjie Li; Hanzhou Wu; Xinpeng Zhang

arXiv:2110.12948·cs.CR·March 6, 2023

Generating Watermarked Adversarial Texts

Mingjie Li, Hanzhou Wu, Xinpeng Zhang

PDF

Open Access

TL;DR

This paper introduces a framework for creating watermarked adversarial texts that can deceive neural networks while embedding a watermark for ownership verification, even after further attacks.

Contribution

The paper proposes a novel method to generate watermarked adversarial texts that maintain effectiveness and watermark integrity against subsequent adversarial attacks.

Findings

01

Successfully fools advanced DNN models

02

Watermark remains intact after additional adversarial attacks

03

Watermarked texts have high semantic quality

Abstract

Adversarial example generation has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples, which reveals the vulnerability of DNNs, motivating us to find good solutions to improve the robustness of DNN models. Due to the extensiveness and high liquidity of natural language over the social networks, various natural language based adversarial attack algorithms have been proposed in the literature. These algorithms generate adversarial text examples with high semantic quality. However, the generated adversarial text examples may be maliciously or illegally used. In order to tackle with this problem, we present a general framework for generating watermarked adversarial text examples. For each word in a given text, a set of candidate words are determined to ensure that all the words in the set can be used to either…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques