Revisiting Character-level Adversarial Attacks for Language Models

Elias Abad Rocamora; Yongtao Wu; Fanghui Liu; Grigorios G.; Chrysos; Volkan Cevher

arXiv:2405.04346·cs.LG·September 5, 2024·1 cites

Revisiting Character-level Adversarial Attacks for Language Models

Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G., Chrysos, Volkan Cevher

PDF

Open Access 1 Repo

TL;DR

This paper introduces Charmer, a query-based character-level adversarial attack that effectively fools language models like BERT and Llama 2 while preserving sentence semantics, challenging previous assumptions about character-level attacks.

Contribution

The paper presents Charmer, a novel, efficient query-based character-level attack that achieves high success rates and maintains semantic similarity, outperforming existing methods.

Findings

01

Charmer achieves higher attack success rates on BERT and Llama 2.

02

It maintains high semantic similarity in adversarial examples.

03

Charmer outperforms previous character-level attack methods.

Abstract

Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in 4.84% points and the USE similarity in 8% points with respect to the previous art. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lions-epfl/charmer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · War, Ethics, and Justification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Attention Dropout · Dropout · Residual Connection · Softmax · WordPiece · Linear Layer · Adam