Natural Backdoor Attack on Text Data

Lichao Sun

arXiv:2006.16176·cs.CL·January 18, 2021·22 cites

Natural Backdoor Attack on Text Data

Lichao Sun

PDF

Open Access

TL;DR

This paper introduces natural backdoor attacks on NLP models, demonstrating highly effective trigger-based attacks with a 100% success rate and minimal impact on model accuracy.

Contribution

It is the first to propose and evaluate natural backdoor attack strategies on NLP models, exploring various trigger generation methods and their effectiveness.

Findings

01

Achieved 100% attack success rate

02

Minimal accuracy drop of 0.83%

03

Effective trigger strategies for text data

Abstract

Recently, advanced NLP models have seen a surge in the usage of various applications. This raises the security threats of the released models. In addition to the clean models' unintentional weaknesses, {\em i.e.,} adversarial attacks, the poisoned models with malicious intentions are much more dangerous in real life. However, most existing works currently focus on the adversarial attacks on NLP models instead of positioning attacks, also named \textit{backdoor attacks}. In this paper, we first propose the \textit{natural backdoor attacks} on NLP models. Moreover, we exploit the various attack strategies to generate trigger on text data and investigate different types of triggers based on modification scope, human recognition, and special cases. Last, we evaluate the backdoor attacks, and the results show the excellent performance of with 100\% backdoor attacks success rate and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection