Balanced Adversarial Training: Balancing Tradeoffs between Fickleness   and Obstinacy in NLP Models

Hannah Chen; Yangfeng Ji; David Evans

arXiv:2210.11498·cs.CL·November 1, 2022

Balanced Adversarial Training: Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models

Hannah Chen, Yangfeng Ji, David Evans

PDF

Open Access 1 Repo

TL;DR

This paper introduces Balanced Adversarial Training, a novel method that uses contrastive learning to improve NLP models' robustness against both fickle and obstinate adversarial examples, addressing a key tradeoff.

Contribution

The paper proposes a new adversarial training approach that balances robustness to two types of adversarial examples in NLP models, a gap in existing methods.

Findings

01

Balanced Adversarial Training improves robustness to both adversarial types

02

Standard adversarial training may increase vulnerability to obstinate examples

03

Contrastive learning enhances model resilience against diverse adversarial attacks

Abstract

Traditional (fickle) adversarial examples involve finding a small perturbation that does not change an input's true label but confuses the classifier into outputting a different prediction. Conversely, obstinate adversarial examples occur when an adversary finds a small perturbation that preserves the classifier's prediction but changes the true label of an input. Adversarial training and certified robust training have shown some effectiveness in improving the robustness of machine learnt models to fickle adversarial examples. We show that standard adversarial training methods focused on reducing vulnerability to fickle adversarial examples may make a model more vulnerable to obstinate adversarial examples, with experiments for both natural language inference and paraphrase identification tasks. To counter this phenomenon, we introduce Balanced Adversarial Training, which incorporates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hannahxchen/balanced-adversarial-training
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling

MethodsContrastive Learning