Impact of Adversarial Training on Robustness and Generalizability of   Language Models

Enes Altinisik; Hassan Sajjad; Husrev Taha Sencar; Safa Messaoud,; Sanjay Chawla

arXiv:2211.05523·cs.CL·December 12, 2023·1 cites

Impact of Adversarial Training on Robustness and Generalizability of Language Models

Enes Altinisik, Hassan Sajjad, Husrev Taha Sencar, Safa Messaoud,, Sanjay Chawla

PDF

Open Access

TL;DR

This paper compares various adversarial training methods for language models, revealing trade-offs between robustness and generalization, and providing a deep qualitative analysis of adversarial example generation techniques.

Contribution

It offers an in-depth comparison of data augmentation and input perturbation methods, highlighting their effects on robustness and generalization in language models.

Findings

01

Pre-training data augmentation enhances robustness.

02

Input space perturbation improves robustness.

03

Embedding space perturbation enhances generalization.

Abstract

Adversarial training is widely acknowledged as the most effective defense against adversarial attacks. However, it is also well established that achieving both robustness and generalization in adversarially trained models involves a trade-off. The goal of this work is to provide an in depth comparison of different approaches for adversarial training in language models. Specifically, we study the effect of pre-training data augmentation as well as training time input perturbations vs. embedding space perturbations on the robustness and generalization of transformer-based language models. Our findings suggest that better robustness can be achieved by pre-training data augmentation or by training with input space perturbation. However, training with embedding space perturbation significantly improves generalization. A linguistic correlation analysis of neurons of the learned models reveals…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Adversarial Robustness in Machine Learning