RobustNLP: A Technique to Defend NLP Models Against Backdoor Attacks
Marwan Omar

TL;DR
RobustNLP introduces a clustering-based method called RobustEncoder to detect and eliminate backdoor triggers in NLP models, enhancing their security against adversarial data manipulation.
Contribution
The paper presents the first clustering-based approach specifically designed to defend NLP models from backdoor attacks, filling a significant research gap.
Findings
Effective detection and removal of backdoor triggers demonstrated
Significant improvement in model robustness against backdoor attacks
Method outperforms existing defenses in empirical evaluations
Abstract
As machine learning (ML) systems are being increasingly employed in the real world to handle sensitive tasks and make decisions in various fields, the security and privacy of those models have also become increasingly critical. In particular, Deep Neural Networks (DNN) have been shown to be vulnerable to backdoor attacks whereby adversaries have access to the training data and the opportunity to manipulate such data by inserting carefully developed samples into the training dataset. Although the NLP community has produced several studies on generating backdoor attacks proving the vulnerable state of language modes, to the best of our knowledge, there does not exist any work to combat such attacks. To bridge this gap, we present RobustEncoder: a novel clustering-based technique for detecting and removing backdoor attacks in the text domain. Extensive empirical results demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Network Security and Intrusion Detection
