Achieving Model Robustness through Discrete Adversarial Training

Maor Ivgi; Jonathan Berant

arXiv:2104.05062·cs.LG·November 2, 2021

Achieving Model Robustness through Discrete Adversarial Training

Maor Ivgi, Jonathan Berant

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel online discrete adversarial training method for language models, using new attack strategies that significantly improve robustness and training efficiency over traditional offline augmentation techniques.

Contribution

The work presents a new discrete attack based on best-first search and random sampling, enabling online augmentation that enhances model robustness more effectively and efficiently.

Findings

01

Random sampling attacks outperform offline augmentation in robustness gains.

02

Online augmentation speeds up training by approximately 10 times.

03

Search-based attacks further improve robustness on multiple datasets.

Abstract

Discrete adversarial attacks are symbolic perturbations to a language input that preserve the output label but lead to a prediction error. While such attacks have been extensively explored for the purpose of evaluating model robustness, their utility for improving robustness has been limited to offline augmentation only. Concretely, given a trained model, attacks are used to generate perturbed (adversarial) examples, and the model is re-trained exactly once. In this work, we address this gap and leverage discrete attacks for online augmentation, where adversarial examples are generated at every training step, adapting to the changing nature of the model. We propose (i) a new discrete attack, based on best-first search, and (ii) random sampling attacks that unlike prior work are not based on expensive search-based procedures. Surprisingly, we find that random sampling leads to impressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Mivg/robust_transformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Nuclear Materials and Properties · Topic Modeling