Backdoor Attack against NLP models with Robustness-Aware Perturbation   defense

Shaik Mohammed Maqsood; Viveros Manuela Ceron; Addluri GowthamKrishna

arXiv:2204.05758·cs.CR·April 13, 2022·1 cites

Backdoor Attack against NLP models with Robustness-Aware Perturbation defense

Shaik Mohammed Maqsood, Viveros Manuela Ceron, Addluri GowthamKrishna

PDF

Open Access

TL;DR

This paper demonstrates how to bypass robustness-aware perturbation defenses in NLP models by using adversarial training to equalize robustness gaps between poisoned and clean samples, undermining existing backdoor defenses.

Contribution

The paper introduces a novel method to break robustness-aware perturbation defenses in NLP backdoor attacks through adversarial training techniques.

Findings

01

Robustness gap can be controlled via adversarial training.

02

Existing defenses can be bypassed by equalizing robustness.

03

Backdoor attack effectiveness increases with this method.

Abstract

Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs), such that the attacked model performs well on benign samples, whereas its prediction will be maliciously changed if the hidden backdoor is activated by the attacker defined trigger. This threat could happen when the training process is not fully controlled, such as training on third-party data-sets or adopting third-party models. There has been a lot of research and different methods to defend such type of backdoor attacks, one being robustness-aware perturbation-based defense method. This method mainly exploits big gap of robustness between poisoned and clean samples. In our work, we break this defense by controlling the robustness gap between poisoned and clean samples using adversarial training step.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications