Reverse Engineering $\ell_p$ attacks: A block-sparse optimization approach with recovery guarantees
Darshan Thaker, Paris Giampouras, Ren\'e Vidal

TL;DR
This paper introduces a block-sparse optimization framework with recovery guarantees to reverse engineer and classify adversarial attacks on neural networks, identifying attack types and recovering clean signals.
Contribution
It formulates attack reverse engineering as a block-sparse recovery problem with geometric conditions for successful decomposition and classification.
Findings
Effective attack type classification demonstrated on digit and face datasets.
Theoretical geometric conditions ensure accurate decomposition of attacked signals.
Proposed method outperforms existing approaches in attack identification and signal recovery.
Abstract
Deep neural network-based classifiers have been shown to be vulnerable to imperceptible perturbations to their input, such as -bounded norm adversarial attacks. This has motivated the development of many defense methods, which are then broken by new attacks, and so on. This paper focuses on a different but related problem of reverse engineering adversarial attacks. Specifically, given an attacked signal, we study conditions under which one can determine the type of attack (, or ) and recover the clean signal. We pose this problem as a block-sparse recovery problem, where both the signal and the attack are assumed to lie in a union of subspaces that includes one subspace per class and one subspace per attack type. We derive geometric conditions on the subspaces under which any attacked signal can be decomposed as the sum of a clean signal plus an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Physical Unclonable Functions (PUFs) and Hardware Security
