TREATED:Towards Universal Defense against Textual Adversarial Attacks

Bin Zhu; Zhaoquan Gu; Le Wang; Zhihong Tian

arXiv:2109.06176·cs.LG·September 15, 2021·5 cites

TREATED:Towards Universal Defense against Textual Adversarial Attacks

Bin Zhu, Zhaoquan Gu, Le Wang, Zhihong Tian

PDF

Open Access

TL;DR

TREATED introduces a universal adversarial detection approach that effectively identifies adversarial examples across various attack types and perturbation levels without relying on specific assumptions, outperforming existing methods.

Contribution

The paper presents TREATED, a novel universal detection method that does not depend on attack assumptions, enhancing robustness against diverse adversarial attacks.

Findings

01

Outperforms baseline detection methods in experiments

02

Effective across multiple neural network architectures

03

Validated through extensive ablation studies

Abstract

Recent work shows that deep neural networks are vulnerable to adversarial examples. Much work studies adversarial example generation, while very little work focuses on more critical adversarial defense. Existing adversarial detection methods usually make assumptions about the adversarial example and attack method (e.g., the word frequency of the adversarial example, the perturbation level of the attack method). However, this limits the applicability of the detection method. To this end, we propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. TREATED identifies adversarial examples through a set of well-designed reference models. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques