RobustNLP: A Technique to Defend NLP Models Against Backdoor Attacks

Marwan Omar

arXiv:2302.09420·cs.CR·February 21, 2023

RobustNLP: A Technique to Defend NLP Models Against Backdoor Attacks

Marwan Omar

PDF

Open Access

TL;DR

RobustNLP introduces a clustering-based method called RobustEncoder to detect and eliminate backdoor triggers in NLP models, enhancing their security against adversarial data manipulation.

Contribution

The paper presents the first clustering-based approach specifically designed to defend NLP models from backdoor attacks, filling a significant research gap.

Findings

01

Effective detection and removal of backdoor triggers demonstrated

02

Significant improvement in model robustness against backdoor attacks

03

Method outperforms existing defenses in empirical evaluations

Abstract

As machine learning (ML) systems are being increasingly employed in the real world to handle sensitive tasks and make decisions in various fields, the security and privacy of those models have also become increasingly critical. In particular, Deep Neural Networks (DNN) have been shown to be vulnerable to backdoor attacks whereby adversaries have access to the training data and the opportunity to manipulate such data by inserting carefully developed samples into the training dataset. Although the NLP community has produced several studies on generating backdoor attacks proving the vulnerable state of language modes, to the best of our knowledge, there does not exist any work to combat such attacks. To bridge this gap, we present RobustEncoder: a novel clustering-based technique for detecting and removing backdoor attacks in the text domain. Extensive empirical results demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection