Block-Sparse Adversarial Attack to Fool Transformer-Based Text   Classifiers

Sahar Sadrizadeh; Ljiljana Dolamic; Pascal Frossard

arXiv:2203.05948·cs.CL·March 14, 2022

Block-Sparse Adversarial Attack to Fool Transformer-Based Text Classifiers

Sahar Sadrizadeh, Ljiljana Dolamic, Pascal Frossard

PDF

Open Access 1 Repo

TL;DR

This paper introduces a gradient-based, block-sparse adversarial attack method targeting transformer-based text classifiers, effectively reducing model accuracy while preserving sentence semantics and making minimal perturbations.

Contribution

It proposes a novel block-sparse adversarial attack leveraging gradient projection for transformer-based text classifiers, with demonstrated high effectiveness and minimal perturbations.

Findings

01

Reduces GPT-2 accuracy to less than 5% on multiple datasets

02

Maintains sentence semantics despite adversarial perturbations

03

Produces small, sparse modifications in adversarial examples

Abstract

Recently, it has been shown that, in spite of the significant performance of deep neural networks in different fields, those are vulnerable to adversarial examples. In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers. The adversarial perturbation in our method is imposed to be block-sparse so that the resultant adversarial example differs from the original sentence in only a few words. Due to the discrete nature of textual data, we perform gradient projection to find the minimizer of our proposed optimization problem. Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5% on different datasets (AG News, MNLI, and Yelp Reviews). Furthermore, the block-sparsity constraint of the proposed optimization problem results in small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sssadrizadeh/transformer-text-classifier-attack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Adam · Byte Pair Encoding · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Cosine Annealing · Attention Dropout