Prediction Inconsistency Helps Achieve Generalizable Detection of Adversarial Examples
Sicong Han, Chenhao Lin, Zhengyu Zhao, Xiyuan Wang, Xinlei He, Qian Li, Cong Wang, Qian Wang, Chao Shen

TL;DR
This paper introduces Prediction Inconsistency Detector (PID), a novel, lightweight framework that leverages prediction differences between models to robustly detect adversarial examples across various training strategies and attack settings.
Contribution
The paper proposes PID, a generalizable detection method based on prediction inconsistency, which outperforms existing approaches on multiple datasets and attack scenarios.
Findings
PID achieves over 99% AUC on CIFAR-10 with both training types.
PID outperforms state-of-the-art methods by up to 25% in AUC.
Effective across white-box, black-box, and mixed attacks.
Abstract
Adversarial detection protects models from adversarial attacks by refusing suspicious test samples. However, current detection methods often suffer from weak generalization: their effectiveness tends to degrade significantly when applied to adversarially trained models rather than naturally trained ones, and they generally struggle to achieve consistent effectiveness across both white-box and black-box attack settings. In this work, we observe that an auxiliary model, differing from the primary model in training strategy or model architecture, tends to assign low confidence to the primary model's predictions on adversarial examples (AEs), while preserving high confidence on normal examples (NEs). Based on this discovery, we propose Prediction Inconsistency Detector (PID), a lightweight and generalizable detection framework to distinguish AEs from NEs by capturing the prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
