Generating Fluent Adversarial Examples for Natural Languages
Huangzhao Zhang, Hao Zhou, Ning Miao, Lei Li

TL;DR
This paper introduces MHA, a novel method for generating fluent adversarial examples in NLP by combining gradient guidance with Metropolis-Hastings sampling, improving attack success and model robustness.
Contribution
The paper presents MHA, a new approach that effectively creates fluent adversarial examples for NLP tasks using gradient-guided sampling, addressing fluency and perturbation challenges.
Findings
MHA outperforms baseline models in attack success on IMDB and SNLI datasets.
Adversarial training with MHA enhances model robustness and performance.
MHA effectively balances fluency and perturbation constraints in adversarial example generation.
Abstract
Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHA outperforms the baseline model on attacking capability. Adversarial training with MAH also leads to better robustness and performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
