TREATED:Towards Universal Defense against Textual Adversarial Attacks
Bin Zhu, Zhaoquan Gu, Le Wang, Zhihong Tian

TL;DR
TREATED introduces a universal adversarial detection approach that effectively identifies adversarial examples across various attack types and perturbation levels without relying on specific assumptions, outperforming existing methods.
Contribution
The paper presents TREATED, a novel universal detection method that does not depend on attack assumptions, enhancing robustness against diverse adversarial attacks.
Findings
Outperforms baseline detection methods in experiments
Effective across multiple neural network architectures
Validated through extensive ablation studies
Abstract
Recent work shows that deep neural networks are vulnerable to adversarial examples. Much work studies adversarial example generation, while very little work focuses on more critical adversarial defense. Existing adversarial detection methods usually make assumptions about the adversarial example and attack method (e.g., the word frequency of the adversarial example, the perturbation level of the attack method). However, this limits the applicability of the detection method. To this end, we propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. TREATED identifies adversarial examples through a set of well-designed reference models. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques
