SHIELD: Defending Textual Neural Networks against Multiple Black-Box   Adversarial Attacks with Stochastic Multi-Expert Patcher

Thai Le; Noseong Park; Dongwon Lee

arXiv:2011.08908·cs.LG·March 17, 2022·1 cites

SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher

Thai Le, Noseong Park, Dongwon Lee

PDF

Open Access 1 Repo

TL;DR

SHIELD is a novel defense method that transforms textual neural networks into stochastic ensembles of experts by only re-training the last layer, effectively defending against multiple black-box adversarial attacks with improved accuracy.

Contribution

The paper introduces SHIELD, a new approach that patches and transforms existing models into stochastic ensembles, enhancing robustness without full re-training or attack-specific defenses.

Findings

01

Achieves 15%-70% accuracy improvement against black-box attacks

02

Effective across CNN, RNN, BERT, and RoBERTa models

03

Outperforms six baseline defenses on three datasets

Abstract

Even though several methods have proposed to defend textual neural network (NN) models against black-box adversarial attacks, they often defend against a specific text perturbation strategy and/or require re-training the models from scratch. This leads to a lack of generalization in practice and redundant computation. In particular, the state-of-the-art transformer models (e.g., BERT, RoBERTa) require great time and computation resources. By borrowing an idea from software engineering, in order to address these limitations, we propose a novel algorithm, SHIELD, which modifies and re-trains only the last layer of a textual NN, and thus it "patches" and "transforms" the NN into a stochastic weighted ensemble of multi-expert prediction heads. Considering that most of current black-box attacks rely on iterative search mechanisms to optimize their adversarial perturbations, SHIELD confuses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lethaiq/shield-defend-adversarial-texts
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsLinear Layer · Softmax · Attention Dropout · Residual Connection · Dropout · Dense Connections · WordPiece · Layer Normalization · Adam · Linear Warmup With Linear Decay