Quantifying Robustness to Adversarial Word Substitutions
Yuting Yang, Pei Huang, FeiFei Ma, Juan Cao, Meishan Zhang, Jian Zhang, and Jintao Li

TL;DR
This paper introduces a formal framework to evaluate and quantify the robustness of NLP models against adversarial word substitutions, providing bounds and metrics to understand model vulnerabilities.
Contribution
It proposes a novel robustness evaluation framework with bounds and a statistical metric, addressing the computational challenge of measuring robustness radius.
Findings
State-of-the-art models like BERT are vulnerable to word substitutions.
The proposed metric quantifies susceptibility outside the safe radius.
The framework offers tighter bounds on robustness radius estimation.
Abstract
Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations. Before they are widely adopted, the fundamental issues of robustness need to be addressed. Along this line, we propose a formal framework to evaluate word-level robustness. First, to study safe regions for a model, we introduce robustness radius which is the boundary where the model can resist any perturbation. As calculating the maximum robustness radius is computationally hard, we estimate its upper and lower bound. We repurpose attack methods as ways of seeking upper bound and design a pseudo-dynamic programming algorithm for a tighter upper bound. Then verification method is utilized for a lower bound. Further, for evaluating the robustness of regions outside a safe radius, we reexamine robustness from another view: quantification. A robustness metric with a rigorous statistical guarantee…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software Engineering Research · Machine Learning in Materials Science
MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Layer Normalization · WordPiece · Dense Connections · Multi-Head Attention · Softmax
