Explain2Attack: Text Adversarial Attacks via Cross-Domain   Interpretability

Mahmoud Hossam; Trung Le; He Zhao; and Dinh Phung

arXiv:2010.06812·cs.LG·January 19, 2021

Explain2Attack: Text Adversarial Attacks via Cross-Domain Interpretability

Mahmoud Hossam, Trung Le, He Zhao, and Dinh Phung

PDF

1 Repo

TL;DR

Explain2Attack introduces a novel black-box text adversarial attack method that leverages cross-domain interpretability to efficiently identify important words, reducing query costs while maintaining or improving attack success rates.

Contribution

The paper proposes a new black-box attack framework using an interpretable substitute model to improve efficiency and reduce queries in text adversarial attacks.

Findings

01

Achieves comparable or better attack success rates than state-of-the-art methods.

02

Requires fewer queries, making attacks more practical in real-world scenarios.

03

Demonstrates higher efficiency in generating adversarial examples.

Abstract

Training robust deep learning models for down-stream tasks is a critical challenge. Research has shown that down-stream models can be easily fooled with adversarial inputs that look like the training data, but slightly perturbed, in a way imperceptible to humans. Understanding the behavior of natural language models under these attacks is crucial to better defend these models against such attacks. In the black-box attack setting, where no access to model parameters is available, the attacker can only query the output information from the targeted model to craft a successful attack. Current black-box state-of-the-art models are costly in both computational complexity and number of queries needed to craft successful adversarial examples. For real world scenarios, the number of queries is critical, where less queries are desired to avoid suspicion towards an attacking agent. In this paper,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahossam/Explain2Attack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.