LimeAttack: Local Explainable Method for Textual Hard-Label Adversarial Attack
Hai Zhu, Zhaoqing Yang, Weiwei Shang, Yuren Wu

TL;DR
LimeAttack is a novel local explainable method for hard-label textual adversarial attacks that efficiently generates adversarial examples with fewer queries, demonstrating high transferability and improving model robustness.
Contribution
The paper introduces LimeAttack, a new hard-label attack algorithm using local explanations and beam search, reducing query complexity and outperforming existing methods.
Findings
LimeAttack achieves higher attack success with fewer queries.
Adversarial examples generated are highly transferable.
LimeAttack enhances robustness in adversarial training.
Abstract
Natural language processing models are vulnerable to adversarial examples. Previous textual adversarial attacks adopt gradients or confidence scores to calculate word importance ranking and generate adversarial examples. However, this information is unavailable in the real world. Therefore, we focus on a more realistic and challenging setting, named hard-label attack, in which the attacker can only query the model and obtain a discrete prediction label. Existing hard-label attack algorithms tend to initialize adversarial examples by random substitution and then utilize complex heuristic algorithms to optimize the adversarial perturbation. These methods require a lot of model queries and the attack success rate is restricted by adversary initialization. In this paper, we propose a novel hard-label attack algorithm named LimeAttack, which leverages a local explainable method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Anomaly Detection Techniques and Applications
MethodsFocus
