TextGrad: Advancing Robustness Evaluation in NLP by Gradient-Driven Optimization
Bairu Hou, Jinghan Jia, Yihua Zhang, Guanhua Zhang, Yang Zhang, Sijia, Liu, Shiyu Chang

TL;DR
TextGrad introduces a gradient-driven optimization framework for generating high-quality adversarial examples in NLP, addressing unique challenges of discrete text and fluency constraints to improve robustness evaluation and defense.
Contribution
It presents the first gradient-based attack generator for NLP that effectively handles discrete text and fluency constraints, enhancing robustness assessment and adversarial training.
Findings
TextGrad achieves high attack success rates in NLP robustness evaluation.
Incorporating TextGrad into adversarial training improves model robustness.
Extensive experiments validate the effectiveness of TextGrad in attack and defense scenarios.
Abstract
Robustness evaluation against adversarial examples has become increasingly important to unveil the trustworthiness of the prevailing deep models in natural language processing (NLP). However, in contrast to the computer vision domain where the first-order projected gradient descent (PGD) is used as the benchmark approach to generate adversarial examples for robustness evaluation, there lacks a principled first-order gradient-based robustness evaluation framework in NLP. The emerging optimization challenges lie in 1) the discrete nature of textual inputs together with the strong coupling between the perturbation location and the actual content, and 2) the additional constraint that the perturbed text should be fluent and achieve a low perplexity under a language model. These challenges make the development of PGD-like NLP attacks difficult. To bridge the gap, we propose TextGrad, a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning
