Impact of Adversarial Training on Robustness and Generalizability of Language Models
Enes Altinisik, Hassan Sajjad, Husrev Taha Sencar, Safa Messaoud,, Sanjay Chawla

TL;DR
This paper compares various adversarial training methods for language models, revealing trade-offs between robustness and generalization, and providing a deep qualitative analysis of adversarial example generation techniques.
Contribution
It offers an in-depth comparison of data augmentation and input perturbation methods, highlighting their effects on robustness and generalization in language models.
Findings
Pre-training data augmentation enhances robustness.
Input space perturbation improves robustness.
Embedding space perturbation enhances generalization.
Abstract
Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of transformer-based language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Adversarial Robustness in Machine Learning
