Natural Adversarial Sentence Generation with Gradient-based Perturbation

Yu-Lun Hsieh; Minhao Cheng; Da-Cheng Juan; Wei Wei and; Wen-Lian Hsu; Cho-Jui Hsieh

arXiv:1909.04495·cs.IR·September 11, 2019

Natural Adversarial Sentence Generation with Gradient-based Perturbation

Yu-Lun Hsieh, Minhao Cheng, Da-Cheng Juan, Wei Wei and, Wen-Lian Hsu, Cho-Jui Hsieh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a gradient-based method for generating natural language adversarial examples to test and improve the robustness of text classification models, demonstrating effectiveness in both white-box and black-box settings.

Contribution

It presents a novel gradient-based perturbation algorithm combined with a decoder for generating natural adversarial sentences, advancing the realism and applicability of adversarial attacks in NLP.

Findings

01

Achieved a 20% decrease in accuracy on a sentiment analysis API.

02

Generated more natural adversarial examples compared to previous methods.

03

Proved effectiveness in black-box attack scenarios.

Abstract

This work proposes a novel algorithm to generate natural language adversarial input for text classification models, in order to investigate the robustness of these models. It involves applying gradient-based perturbation on the sentence embeddings that are used as the features for the classifier, and learning a decoder for generation. We employ this method to a sentiment analysis model and verify its effectiveness in inducing incorrect predictions by the model. We also conduct quantitative and qualitative analysis on these examples and demonstrate that our approach can generate more natural adversaries. In addition, it can be used to successfully perform black-box attacks, which involves attacking other existing models whose parameters are not known. On a public sentiment analysis API, the proposed method introduces a 20% relative decrease in average accuracy and 74% relative increase…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shashank93jai/adversarial-sentence-generation
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques