Domain Knowledge Alleviates Adversarial Attacks in Multi-Label Classifiers
Stefano Melacci, Gabriele Ciravegna, Angelo Sotgiu, Ambra Demontis,, Battista Biggio, Marco Gori, Fabio Roli

TL;DR
This paper demonstrates that incorporating domain knowledge as logical constraints into multi-label classifiers can improve detection of adversarial examples, especially when attackers are unaware of these constraints.
Contribution
It introduces a novel framework that integrates domain knowledge into semi-supervised learning to enhance adversarial robustness in multi-label classification.
Findings
Domain knowledge constraints help detect adversarial examples effectively.
The method does not require knowledge of attack strategies during training.
Constraints are especially effective when not known to attackers.
Abstract
Adversarial attacks on machine learning-based classifiers, along with defense mechanisms, have been widely studied in the context of single-label classification problems. In this paper, we shift the attention to multi-label classification, where the availability of domain knowledge on the relationships among the considered classes may offer a natural way to spot incoherent predictions, i.e., predictions associated to adversarial examples lying outside of the training data distribution. We explore this intuition in a framework in which first-order logic knowledge is converted into constraints and injected into a semi-supervised learning problem. Within this setting, the constrained classifier learns to fulfill the domain knowledge over the marginal distribution, and can naturally reject samples with incoherent predictions. Even though our method does not exploit any knowledge of attacks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
