Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

Yangyi Chen; Fanchao Qi; Hongcheng Gao; Zhiyuan Liu; Maosong Sun

arXiv:2110.08247·cs.CR·October 20, 2022·1 cites

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

Yangyi Chen, Fanchao Qi, Hongcheng Gao, Zhiyuan Liu, Maosong Sun

PDF

Open Access 1 Repo

TL;DR

This paper reveals two simple yet effective tricks that significantly enhance the potency of textual backdoor attacks in deep learning models, demonstrating their universal applicability and increased threat level.

Contribution

It introduces two novel tricks that make existing textual backdoor attacks more harmful and universally applicable across different attack models.

Findings

01

Tricks significantly improve attack success rates

02

Effective under various challenging scenarios

03

Demonstrates increased potential harm of backdoor attacks

Abstract

Backdoor attacks are a kind of emergent security threat in deep learning. After being injected with a backdoor, a deep neural model will behave normally on standard inputs but give adversary-specified predictions once the input contains specific backdoor triggers. In this paper, we find two simple tricks that can make existing textual backdoor attacks much more harmful. The first trick is to add an extra training task to distinguish poisoned and clean data during the training of the victim model, and the second one is to use all the clean training data rather than remove the original clean data corresponding to the poisoned data. These two tricks are universally applicable to different attack models. We conduct experiments in three tough situations including clean data fine-tuning, low-poisoning-rate, and label-consistent attacks. Experimental results show that the two tricks can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thunlp/styleattack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Network Security and Intrusion Detection · Advanced Malware Detection Techniques