Block-Sparse Adversarial Attack to Fool Transformer-Based Text Classifiers
Sahar Sadrizadeh, Ljiljana Dolamic, Pascal Frossard

TL;DR
This paper introduces a gradient-based, block-sparse adversarial attack method targeting transformer-based text classifiers, effectively reducing model accuracy while preserving sentence semantics and making minimal perturbations.
Contribution
It proposes a novel block-sparse adversarial attack leveraging gradient projection for transformer-based text classifiers, with demonstrated high effectiveness and minimal perturbations.
Findings
Reduces GPT-2 accuracy to less than 5% on multiple datasets
Maintains sentence semantics despite adversarial perturbations
Produces small, sparse modifications in adversarial examples
Abstract
Recently, it has been shown that, in spite of the significant performance of deep neural networks in different fields, those are vulnerable to adversarial examples. In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers. The adversarial perturbation in our method is imposed to be block-sparse so that the resultant adversarial example differs from the original sentence in only a few words. Due to the discrete nature of textual data, we perform gradient projection to find the minimizer of our proposed optimization problem. Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5% on different datasets (AG News, MNLI, and Yelp Reviews). Furthermore, the block-sparsity constraint of the proposed optimization problem results in small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Adam · Byte Pair Encoding · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Cosine Annealing · Attention Dropout
