On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection
Songyang Gao, Shihan Dou, Qi Zhang, Xuanjing Huang, Jin Ma, Ying Shan

TL;DR
This paper introduces a data-free, universal adversarial perturbation-based detection method that effectively identifies adversarial samples in text classification without requiring training data, ensuring privacy and broad applicability.
Contribution
The work presents a novel data-agnostic framework leveraging UAPs for adversarial detection, eliminating the need for original training data and maintaining efficiency.
Findings
Achieves competitive detection accuracy across multiple text tasks.
Operates with similar speed to standard inference methods.
Does not require access to original training data.
Abstract
Detecting adversarial samples that are carefully crafted to fool the model is a critical step to socially-secure applications. However, existing adversarial detection methods require access to sufficient training data, which brings noteworthy concerns regarding privacy leakage and generalizability. In this work, we validate that the adversarial sample generated by attack algorithms is strongly related to a specific vector in the high-dimensional inputs. Such vectors, namely UAPs (Universal Adversarial Perturbations), can be calculated without original training data. Based on this discovery, we propose a data-agnostic adversarial detection framework, which induces different responses between normal and adversarial samples to UAPs. Experimental results show that our method achieves competitive detection performance on various text classification tasks, and maintains an equivalent time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques
