FastWordBug: A Fast Method To Generate Adversarial Text Against NLP Applications
Dou Goodman, Lv Zhonghou, Wang minghua

TL;DR
FastWordBug is an efficient black-box adversarial attack algorithm that quickly identifies and perturbs key words in text to mislead NLP models with minimal queries.
Contribution
The paper introduces FastWordBug, a novel scoring method for rapid identification of impactful words to generate adversarial text in black-box settings.
Findings
Significantly reduces NLP model accuracy
Requires minimal model queries
Effective against real-world cloud NLP services
Abstract
In this paper, we present a novel algorithm, FastWordBug, to efficiently generate small text perturbations in a black-box setting that forces a sentiment analysis or text classification mode to make an incorrect prediction. By combining the part of speech attributes of words, we propose a scoring method that can quickly identify important words that affect text classification. We evaluate FastWordBug on three real-world text datasets and two state-of-the-art machine learning models under black-box setting. The results show that our method can significantly reduce the accuracy of the model, and at the same time, we can call the model as little as possible, with the highest attack efficiency. We also attack two popular real-world cloud services of NLP, and the results show that our method works as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques
