TL;DR
This paper introduces a universal adversarial policy for text classifiers that generates natural, semantics-preserving adversarial examples efficiently across many texts, advancing practical adversarial attack methods.
Contribution
It proposes a novel universal adversarial setup using a learned search policy over text alterations, demonstrating the existence of universal adversarial patterns in text domain.
Findings
The approach successfully finds adversarial examples on new texts.
Reinforcement learning effectively generalizes from limited training data.
Universal adversarial patterns are feasible in text classification.
Abstract
Discovering the existence of universal adversarial perturbations had large theoretical and practical impacts on the field of adversarial learning. In the text domain, most universal studies focused on adversarial prefixes which are added to all texts. However, unlike the vision domain, adding the same perturbation to different inputs results in noticeably unnatural inputs. Therefore, we introduce a new universal adversarial setup - a universal adversarial policy, which has many advantages of other universal attacks but also results in valid texts - thus making it relevant in practice. We achieve this by learning a single search policy over a predefined set of semantics preserving text alterations, on many texts. This formulation is universal in that the policy is successful in finding adversarial examples on new texts efficiently. Our approach uses text perturbations which were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
