BAARD: Blocking Adversarial Examples by Testing for Applicability,   Reliability and Decidability

Xinglong Chang; Katharina Dost; Kaiqi Zhao; Ambra Demontis; Fabio; Roli; Gill Dobbie; J\"org Wicker

arXiv:2105.00495·cs.LG·September 15, 2023

BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability

Xinglong Chang, Katharina Dost, Kaiqi Zhao, Ambra Demontis, Fabio, Roli, Gill Dobbie, J\"org Wicker

PDF

1 Repo

TL;DR

This paper introduces BAARD, a novel framework inspired by cheminformatics' Applicability Domain, to detect adversarial examples by verifying their coherence with training data, applicable across various models and attack types.

Contribution

Proposes the first application of Applicability Domain concepts to adversarial example detection, providing a robust, model-agnostic, triple-stage framework that enhances detection of diverse attacks.

Findings

01

Effectively detects various adversarial attacks including white-box scenarios.

02

Works across different classification models without attack-specific tuning.

03

Improves robustness by verifying input coherence with training data.

Abstract

Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

changx03/baard
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.