DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example   Generation

Catherine Wong

arXiv:1712.05419·cs.LG·December 18, 2017·21 cites

DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation

Catherine Wong

PDF

Open Access 1 Repo

TL;DR

This paper introduces DANCin SEQ2SEQ, a GAN-inspired reinforcement learning method for generating adversarial text examples that can fool black-box classifiers, providing insights into model vulnerabilities.

Contribution

The work presents a novel GAN-inspired algorithm for adversarial text generation applicable to black-box classifiers, recasting the problem as reinforcement learning.

Findings

01

Preliminary results show promising adversarial example generation.

02

The method works in black-box attack scenarios.

03

Semantic meaningfulness of generated adversarial texts is demonstrated.

Abstract

Machine learning models are powerful but fallible. Generating adversarial examples - inputs deliberately crafted to cause model misclassification or other errors - can yield important insight into model assumptions and vulnerabilities. Despite significant recent work on adversarial example generation targeting image classifiers, relatively little work exists exploring adversarial example generation for text classifiers; additionally, many existing adversarial example generation algorithms require full access to target model parameters, rendering them impractical for many real-world attacks. In this work, we introduce DANCin SEQ2SEQ, a GAN-inspired algorithm for adversarial text example generation targeting largely black-box text classifiers. We recast adversarial text example generation as a reinforcement learning problem, and demonstrate that our algorithm offers preliminary but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CatherineWong/dancin_seq2seq
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Hate Speech and Cyberbullying Detection