DSRM: Boost Textual Adversarial Training with Distribution Shift Risk   Minimization

Songyang Gao; Shihan Dou; Yan Liu; Xiao Wang; Qi Zhang; Zhongyu Wei,; Jin Ma; Ying Shan

arXiv:2306.15164·cs.CL·June 28, 2023

DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization

Songyang Gao, Shihan Dou, Yan Liu, Xiao Wang, Qi Zhang, Zhongyu Wei,, Jin Ma, Ying Shan

PDF

Open Access 1 Repo

TL;DR

The paper introduces DSRM, a novel adversarial training method that enhances language model robustness by perturbing data distribution, eliminating the need for adversarial samples and significantly reducing training time while achieving state-of-the-art results.

Contribution

DSRM is a new adversarial training approach that estimates adversarial loss through distribution perturbation, avoiding adversarial sample generation and reducing training time.

Findings

01

Reduces training time by up to 70%

02

Improves BERT's robustness against textual adversarial attacks

03

Achieves state-of-the-art robust accuracy on benchmarks

Abstract

Adversarial training is one of the best-performing methods in improving the robustness of deep language models. However, robust models come at the cost of high time consumption, as they require multi-step gradient ascents or word substitutions to obtain adversarial samples. In addition, these generated samples are deficient in grammatical quality and semantic consistency, which impairs the effectiveness of adversarial training. To address these problems, we introduce a novel, effective procedure for instead adversarial training with only clean data. Our procedure, distribution shift risk minimization (DSRM), estimates the adversarial loss by perturbing the input data's probability distribution rather than their embeddings. This formulation results in a robust model that minimizes the expected global loss under adversarial attacks. Our approach requires zero adversarial samples for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sleepthroughdifficulties/dsrm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Adversarial Robustness in Machine Learning