Learning to Ignore Adversarial Attacks

Yiming Zhang; Yangqiaoyu Zhou; Samuel Carton; Chenhao Tan

arXiv:2205.11551·cs.CL·February 22, 2023

Learning to Ignore Adversarial Attacks

Yiming Zhang, Yangqiaoyu Zhou, Samuel Carton, Chenhao Tan

PDF

Open Access

TL;DR

This paper introduces rationale models that explicitly learn to ignore adversarial attack tokens, significantly improving NLP model robustness against attacks across multiple datasets and models.

Contribution

It proposes a novel rationale-based approach to enhance robustness by enabling models to ignore attack tokens, outperforming data augmentation methods.

Findings

01

Rationale models can ignore over 90% of attack tokens.

02

Achieves approximately 10% improvement in robustness over baselines.

03

Reduces the performance gap between clean and attacked test sets.

Abstract

Despite the strong performance of current NLP models, they can be brittle against adversarial attacks. To enable effective learning against adversarial inputs, we introduce the use of rationale models that can explicitly learn to ignore attack tokens. We find that the rationale models can successfully ignore over 90% of attack tokens. This approach leads to consistent sizable improvements ( $\sim$ 10%) over baseline models in robustness on three datasets for both BERT and RoBERTa, and also reliably outperforms data augmentation with adversarial examples alone. In many cases, we find that our method is able to close the gap between model performance on a clean test set and an attacked test set and hence reduce the effect of adversarial attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Linear Warmup With Linear Decay · Dense Connections · Dropout · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam