Generating Fluent Adversarial Examples for Natural Languages

Huangzhao Zhang; Hao Zhou; Ning Miao; Lei Li

arXiv:2007.06174·cs.CL·July 14, 2020

Generating Fluent Adversarial Examples for Natural Languages

Huangzhao Zhang, Hao Zhou, Ning Miao, Lei Li

PDF

TL;DR

This paper introduces MHA, a novel method for generating fluent adversarial examples in NLP by combining gradient guidance with Metropolis-Hastings sampling, improving attack success and model robustness.

Contribution

The paper presents MHA, a new approach that effectively creates fluent adversarial examples for NLP tasks using gradient-guided sampling, addressing fluency and perturbation challenges.

Findings

01

MHA outperforms baseline models in attack success on IMDB and SNLI datasets.

02

Adversarial training with MHA enhances model robustness and performance.

03

MHA effectively balances fluency and perturbation constraints in adversarial example generation.

Abstract

Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHA outperforms the baseline model on attacking capability. Adversarial training with MAH also leads to better robustness and performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.