SHIELD: Defending Textual Neural Networks against Multiple Black-Box Adversarial Attacks with Stochastic Multi-Expert Patcher
Thai Le, Noseong Park, Dongwon Lee

TL;DR
SHIELD is a novel defense method that transforms textual neural networks into stochastic ensembles of experts by only re-training the last layer, effectively defending against multiple black-box adversarial attacks with improved accuracy.
Contribution
The paper introduces SHIELD, a new approach that patches and transforms existing models into stochastic ensembles, enhancing robustness without full re-training or attack-specific defenses.
Findings
Achieves 15%-70% accuracy improvement against black-box attacks
Effective across CNN, RNN, BERT, and RoBERTa models
Outperforms six baseline defenses on three datasets
Abstract
Even though several methods have proposed to defend textual neural network (NN) models against black-box adversarial attacks, they often defend against a specific text perturbation strategy and/or require re-training the models from scratch. This leads to a lack of generalization in practice and redundant computation. In particular, the state-of-the-art transformer models (e.g., BERT, RoBERTa) require great time and computation resources. By borrowing an idea from software engineering, in order to address these limitations, we propose a novel algorithm, SHIELD, which modifies and re-trains only the last layer of a textual NN, and thus it "patches" and "transforms" the NN into a stochastic weighted ensemble of multi-expert prediction heads. Considering that most of current black-box attacks rely on iterative search mechanisms to optimize their adversarial perturbations, SHIELD confuses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsLinear Layer · Softmax · Attention Dropout · Residual Connection · Dropout · Dense Connections · WordPiece · Layer Normalization · Adam · Linear Warmup With Linear Decay
