TL;DR
This paper introduces BAARD, a novel framework inspired by cheminformatics' Applicability Domain, to detect adversarial examples by verifying their coherence with training data, applicable across various models and attack types.
Contribution
Proposes the first application of Applicability Domain concepts to adversarial example detection, providing a robust, model-agnostic, triple-stage framework that enhances detection of diverse attacks.
Findings
Effectively detects various adversarial attacks including white-box scenarios.
Works across different classification models without attack-specific tuning.
Improves robustness by verifying input coherence with training data.
Abstract
Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
