Deep Text Classification Can be Fooled

Bin Liang; Hongcheng Li; Miaoqiang Su; Pan Bian; Xirong Li; and Wenchang Shi

arXiv:1704.08006·cs.CR·January 8, 2019

Deep Text Classification Can be Fooled

Bin Liang, Hongcheng Li, Miaoqiang Su, Pan Bian, Xirong Li, and Wenchang Shi

PDF

TL;DR

This paper demonstrates that deep learning-based text classifiers are vulnerable to adversarial attacks by crafting perturbed text samples that can fool classifiers without noticeable changes.

Contribution

The paper introduces a novel method for generating adversarial text samples using gradient-based and occlusion techniques, exposing vulnerabilities in DNN text classifiers.

Findings

01

Adversarial samples successfully fool state-of-the-art classifiers

02

Perturbed texts can be classified into any class without utility loss

03

Perturbations are hard to perceive

Abstract

In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with different adversarial scenarios, the text items that are important for classification are identified by computing the cost gradients of the input (white-box attack) or generating a series of occluded test samples (black-box attack). Based on these items, we design three perturbation strategies, namely insertion, modification, and removal, to generate adversarial samples. The experiment results show that the adversarial samples generated by our method can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers. The adversarial samples can be perturbed to any desirable classes without compromising their utilities.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.