DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation
Catherine Wong

TL;DR
This paper introduces DANCin SEQ2SEQ, a GAN-inspired reinforcement learning method for generating adversarial text examples that can fool black-box classifiers, providing insights into model vulnerabilities.
Contribution
The work presents a novel GAN-inspired algorithm for adversarial text generation applicable to black-box classifiers, recasting the problem as reinforcement learning.
Findings
Preliminary results show promising adversarial example generation.
The method works in black-box attack scenarios.
Semantic meaningfulness of generated adversarial texts is demonstrated.
Abstract
Machine learning models are powerful but fallible. Generating adversarial examples - inputs deliberately crafted to cause model misclassification or other errors - can yield important insight into model assumptions and vulnerabilities. Despite significant recent work on adversarial example generation targeting image classifiers, relatively little work exists exploring adversarial example generation for text classifiers; additionally, many existing adversarial example generation algorithms require full access to target model parameters, rendering them impractical for many real-world attacks. In this work, we introduce DANCin SEQ2SEQ, a GAN-inspired algorithm for adversarial text example generation targeting largely black-box text classifiers. We recast adversarial text example generation as a reinforcement learning problem, and demonstrate that our algorithm offers preliminary but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Hate Speech and Cyberbullying Detection
