DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization
Songyang Gao, Shihan Dou, Yan Liu, Xiao Wang, Qi Zhang, Zhongyu Wei,, Jin Ma, Ying Shan

TL;DR
The paper introduces DSRM, a novel adversarial training method that enhances language model robustness by perturbing data distribution, eliminating the need for adversarial samples and significantly reducing training time while achieving state-of-the-art results.
Contribution
DSRM is a new adversarial training approach that estimates adversarial loss through distribution perturbation, avoiding adversarial sample generation and reducing training time.
Findings
Reduces training time by up to 70%
Improves BERT's robustness against textual adversarial attacks
Achieves state-of-the-art robust accuracy on benchmarks
Abstract
Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effectiveness of adversarial training. To address these problems, we introduce a novel, effective procedure for instead adversarial training with only clean data. Our procedure, distribution shift risk minimization (DSRM), estimates the adversarial loss by perturbing the input data's probability distribution rather than their embeddings. This formulation results in a robust model that minimizes the expected global loss under adversarial attacks. Our approach requires zero adversarial samples for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Adversarial Robustness in Machine Learning
