Character-level White-Box Adversarial Attacks against Transformers via   Attachable Subwords Substitution

Aiwei Liu; Honghai Yu; Xuming Hu; Shu'ang Li; Li Lin; Fukun Ma; Yawen; Yang; Lijie Wen

arXiv:2210.17004·cs.CL·November 1, 2022

Character-level White-Box Adversarial Attacks against Transformers via Attachable Subwords Substitution

Aiwei Liu, Honghai Yu, Xuming Hu, Shu'ang Li, Li Lin, Fukun Ma, Yawen, Yang, Lijie Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel character-level white-box adversarial attack on transformer models that leverages subword substitutions guided by gradients, achieving high success rates while maintaining semantic integrity.

Contribution

It presents the first attack method exploiting subword substitutions at the character level, improving attack success and minimizing modifications compared to prior approaches.

Findings

01

Outperforms previous attack methods in success rate

02

Maintains semantic integrity of adversarial examples

03

Effective on both sentence-level and token-level tasks

Abstract

We propose the first character-level white-box adversarial attack method against transformer models. The intuition of our method comes from the observation that words are split into subtokens before being fed into the transformer models and the substitution between two close subtokens has a similar effect to the character modification. Our method mainly contains three steps. First, a gradient-based method is adopted to find the most vulnerable words in the sentence. Then we split the selected words into subtokens to replace the origin tokenization result from the transformer tokenizer. Finally, we utilize an adversarial loss to guide the substitution of attachable subtokens in which the Gumbel-softmax trick is introduced to ensure gradient propagation. Meanwhile, we introduce the visual and length constraint in the optimization process to achieve minimum character modifications.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-bpm/cwba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning