Revisiting Character-level Adversarial Attacks for Language Models
Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G., Chrysos, Volkan Cevher

TL;DR
This paper introduces Charmer, a query-based character-level adversarial attack that effectively fools language models like BERT and Llama 2 while preserving sentence semantics, challenging previous assumptions about character-level attacks.
Contribution
The paper presents Charmer, a novel, efficient query-based character-level attack that achieves high success rates and maintains semantic similarity, outperforming existing methods.
Findings
Charmer achieves higher attack success rates on BERT and Llama 2.
It maintains high semantic similarity in adversarial examples.
Charmer outperforms previous character-level attack methods.
Abstract
Adversarial attacks in Natural Language Processing apply perturbations in the character or token levels. Token-level attacks, gaining prominence for their use of gradient-based methods, are susceptible to altering sentence semantics, leading to invalid adversarial examples. While character-level attacks easily maintain semantics, they have received less attention as they cannot easily adopt popular gradient-based methods, and are thought to be easy to defend. Challenging these beliefs, we introduce Charmer, an efficient query-based adversarial attack capable of achieving high attack success rate (ASR) while generating highly similar adversarial examples. Our method successfully targets both small (BERT) and large (Llama 2) models. Specifically, on BERT with SST-2, Charmer improves the ASR in 4.84% points and the USE similarity in 8% points with respect to the previous art. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · War, Ethics, and Justification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Attention Dropout · Dropout · Residual Connection · Softmax · WordPiece · Linear Layer · Adam
