Achieving Model Robustness through Discrete Adversarial Training
Maor Ivgi, Jonathan Berant

TL;DR
This paper introduces a novel online discrete adversarial training method for language models, using new attack strategies that significantly improve robustness and training efficiency over traditional offline augmentation techniques.
Contribution
The work presents a new discrete attack based on best-first search and random sampling, enabling online augmentation that enhances model robustness more effectively and efficiently.
Findings
Random sampling attacks outperform offline augmentation in robustness gains.
Online augmentation speeds up training by approximately 10 times.
Search-based attacks further improve robustness on multiple datasets.
Abstract
Discrete adversarial attacks are symbolic perturbations to a language input that preserve the output label but lead to a prediction error. While such attacks have been extensively explored for the purpose of evaluating model robustness, their utility for improving robustness has been limited to offline augmentation only. Concretely, given a trained model, attacks are used to generate perturbed (adversarial) examples, and the model is re-trained exactly once. In this work, we address this gap and leverage discrete attacks for online augmentation, where adversarial examples are generated at every training step, adapting to the changing nature of the model. We propose (i) a new discrete attack, based on best-first search, and (ii) random sampling attacks that unlike prior work are not based on expensive search-based procedures. Surprisingly, we find that random sampling leads to impressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Nuclear Materials and Properties · Topic Modeling
