A Unified Framework for Adversarial Attack and Defense in Constrained Feature Space
Thibault Simonetto, Salijona Dyrmishi, Salah Ghamizi, Maxime Cordy,, Yves Le Traon

TL;DR
This paper introduces a unified framework for generating feasible adversarial examples under domain constraints, applicable across multiple fields, and proposes a novel defense method that enhances model robustness.
Contribution
The paper presents a versatile framework for constrained adversarial attacks and introduces a new defense strategy using engineered non-convex constraints.
Findings
Achieves up to 100% success rate in generating feasible adversarial examples across four domains.
The proposed defense matches the effectiveness of adversarial retraining.
Framework provides new baselines and datasets for future research.
Abstract
The generation of feasible adversarial examples is necessary for properly assessing models that work in constrained feature space. However, it remains a challenging task to enforce constraints into attacks that were designed for computer vision. We propose a unified framework to generate feasible adversarial examples that satisfy given domain constraints. Our framework can handle both linear and non-linear constraints. We instantiate our framework into two algorithms: a gradient-based attack that introduces constraints in the loss function to maximize, and a multi-objective search algorithm that aims for misclassification, perturbation minimization, and constraint satisfaction. We show that our approach is effective in four different domains, with a success rate of up to 100%, where state-of-the-art attacks fail to generate a single feasible example. In addition to adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
