AED: An black-box NLP classifier model attacker
Yueyang Liu, Yan Huang, Zhipeng Cai

TL;DR
This paper introduces AED, a novel black-box attack method for NLP classifiers that exploits interpretability and synonym substitution to generate adversarial examples, revealing vulnerabilities in DNN models used in critical domains.
Contribution
AED is the first to combine attention-based interpretability with density peaks clustering for effective synonym search in NLP adversarial attacks.
Findings
AED effectively fools NLP models while preserving input meaning.
Compared to existing methods, AED shows higher success rates in adversarial example generation.
AED enhances understanding of model vulnerabilities in high-stakes NLP applications.
Abstract
Deep Neural Networks (DNNs) have been successful in solving real-world tasks in domains such as connected and automated vehicles, disease, and job hiring. However, their implications are far-reaching in critical application areas. Hence, there is a growing concern regarding the potential bias and robustness of these DNN models. A transparency and robust model is always demanded in high-stakes domains where reliability and safety are enforced, such as healthcare and finance. While most studies have focused on adversarial image attack scenarios, fewer studies have investigated the robustness of DNN models in natural language processing (NLP) due to their adversarial samples are difficult to generate. To address this gap, we propose a word-level NLP classifier attack model called "AED," which stands for Attention mechanism enabled post-model Explanation with Density peaks clustering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsGated Recurrent Unit
