Generalizable Adversarial Examples Detection Based on Bi-model Decision Mismatch
Jo\~ao Monteiro, Isabela Albuquerque, Zahid Akhtar, Tiago H. Falk

TL;DR
This paper proposes a model-agnostic adversarial example detection method using bi-model decision mismatch, demonstrating high detection rates across various attack types without prior attack knowledge.
Contribution
It introduces a novel detection framework based on decision layer features from independent models, effective against multiple attack methods without needing attack-specific training.
Findings
Achieves over 90% detection rate in white-box attacks.
Generalizes well to unseen attack types.
Works with unmodified off-the-shelf models.
Abstract
Modern applications of artificial neural networks have yielded remarkable performance gains in a wide range of tasks. However, recent studies have discovered that such modelling strategy is vulnerable to Adversarial Examples, i.e. examples with subtle perturbations often too small and imperceptible to humans, but that can easily fool neural networks. Defense techniques against adversarial examples have been proposed, but ensuring robust performance against varying or novel types of attacks remains an open problem. In this work, we focus on the detection setting, in which case attackers become identifiable while models remain vulnerable. Particularly, we employ the decision layer of independently trained models as features for posterior detection. The proposed framework does not require any prior knowledge of adversarial examples generation techniques, and can be directly employed along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
