Towards Robustness Against Natural Language Word Substitutions
Xinshuai Dong, Anh Tuan Luu, Rongrong Ji, Hong Liu

TL;DR
This paper introduces a novel adversarial training method called ASCC for improving NLP model robustness against semantically similar word substitutions, outperforming existing defenses across multiple tasks and architectures.
Contribution
The paper proposes the ASCC method that models word substitution attacks as a convex hull and uses adversarial training, advancing robustness in NLP models.
Findings
ASCC-defense outperforms current state-of-the-art methods in robustness.
The method improves robustness in sentiment analysis and natural language inference.
Robustly trained word vectors can enhance model robustness without additional defenses.
Abstract
Robustness against word substitutions has a well-defined and widely acceptable form, i.e., using semantically similar words as substitutions, and thus it is considered as a fundamental stepping-stone towards broader robustness in natural language processing. Previous defense methods capture word substitutions in vector space by using either -ball or hyper-rectangle, which results in perturbation sets that are not inclusive enough or unnecessarily large, and thus impedes mimicry of worst cases for robust training. In this paper, we introduce a novel \textit{Adversarial Sparse Convex Combination} (ASCC) method. We model the word substitution attack space as a convex hull and leverages a regularization term to enforce perturbation towards an actual substitution, thus aligning our modeling better with the discrete textual space. Based on the ASCC method, we further propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Domain Adaptation and Few-Shot Learning
