Reformulation is All You Need: Addressing Malicious Text Features in DNNs
Yi Jiang, Oubo Ma, Yong Yang, Tong Zhang, Shouling Ji

TL;DR
This paper introduces a unified, adaptive defense framework for NLP models that detects and mitigates malicious textual features exploited in adversarial and backdoor attacks, improving robustness without compromising semantics.
Contribution
The authors propose a novel reformulation-based defense method that effectively counters both adversarial and backdoor attacks by addressing malicious features during input encoding.
Findings
Outperforms existing defenses across various malicious features
Effective against both adversarial and backdoor attacks
Preserves semantic integrity of inputs
Abstract
Human language encompasses a wide range of intricate and diverse implicit features, which attackers can exploit to launch adversarial or backdoor attacks, compromising DNN models for NLP tasks. Existing model-oriented defenses often require substantial computational resources as model size increases, whereas sample-oriented defenses typically focus on specific attack vectors or schemes, rendering them vulnerable to adaptive attacks. We observe that the root cause of both adversarial and backdoor attacks lies in the encoding process of DNN models, where subtle textual features, negligible for human comprehension, are erroneously assigned significant weight by less robust or trojaned models. Based on it we propose a unified and adaptive defense framework that is effective against both adversarial and backdoor attacks. Our approach leverages reformulation modules to address potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Digital and Cyber Forensics · Access Control and Trust
MethodsFocus
