A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

Tianle Li; Yi Yang

arXiv:2206.05015·cs.CL·June 13, 2022

A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

Tianle Li, Yi Yang

PDF

Open Access

TL;DR

This paper introduces a simple and efficient black-box word-substitute attack method that significantly reduces the number of queries needed to fool NLP models while maintaining attack success.

Contribution

The paper presents a novel attack approach that decreases query counts by up to 30 times compared to existing methods, enhancing attack efficiency in NLP.

Findings

01

Reduces average queries by 3-30 times

02

Maintains high attack success rate

03

Highlights vulnerability with lower cost

Abstract

NLP researchers propose different word-substitute black-box attacks that can fool text classification models. In such attack, an adversary keeps sending crafted adversarial queries to the target model until it can successfully achieve the intended outcome. State-of-the-art attack methods usually require hundreds or thousands of queries to find one adversarial example. In this paper, we study whether a sophisticated adversary can attack the system with much less queries. We propose a simple yet efficient method that can reduce the average number of adversarial queries by 3-30 times and maintain the attack effectiveness. This research highlights that an adversary can fool a deep NLP model with much less cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Topic Modeling