LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial   Attack

Hai Zhu; Zhaoqing Yang; Weiwei Shang; Yuren Wu

arXiv:2308.00319·cs.CL·January 11, 2024

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack

Hai Zhu, Zhaoqing Yang, Weiwei Shang, Yuren Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

LimeAttack is a novel local explainable method for hard-label textual adversarial attacks that efficiently generates adversarial examples with fewer queries, demonstrating high transferability and improving model robustness.

Contribution

The paper introduces LimeAttack, a new hard-label attack algorithm using local explanations and beam search, reducing query complexity and outperforming existing methods.

Findings

01

LimeAttack achieves higher attack success with fewer queries.

02

Adversarial examples generated are highly transferable.

03

LimeAttack enhances robustness in adversarial training.

Abstract

Natural language processing models are vulnerable to adversarial examples. Previous textual adversarial attacks adopt gradients or confidence scores to calculate word importance ranking and generate adversarial examples. However, this information is unavailable in the real world. Therefore, we focus on a more realistic and challenging setting, named hard-label attack, in which the attacker can only query the model and obtain a discrete prediction label. Existing hard-label attack algorithms tend to initialize adversarial examples by random substitution and then utilize complex heuristic algorithms to optimize the adversarial perturbation. These methods require a lot of model queries and the attack success rate is restricted by adversary initialization. In this paper, we propose a novel hard-label attack algorithm named LimeAttack, which leverages a local explainable method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhuhai-ustc/limeattack
pytorchOfficial

Videos

LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Anomaly Detection Techniques and Applications

MethodsFocus