Towards Interpretable Adversarial Examples via Sparse Adversarial Attack
Fudong Lin, Jiadong Lou, Hao Wang, Brian Jalaian, and Xu Yuan

TL;DR
This paper introduces a novel, efficient sparse adversarial attack method that produces interpretable perturbations, improves transferability and attack strength, and helps understand DNN vulnerabilities.
Contribution
A new parameterization technique for l0 optimization and a loss function to generate sparse, transferable, and strong adversarial examples with theoretical guarantees.
Findings
Outperforms state-of-the-art sparse attacks in efficiency and effectiveness.
Produces sparser adversarial examples revealing interpretability.
Validates the approach through extensive experiments and theoretical analysis.
Abstract
Sparse attacks are to optimize the magnitude of adversarial perturbations for fooling deep neural networks (DNNs) involving only a few perturbed pixels (i.e., under the l0 constraint), suitable for interpreting the vulnerability of DNNs. However, existing solutions fail to yield interpretable adversarial examples due to their poor sparsity. Worse still, they often struggle with heavy computational overhead, poor transferability, and weak attack strength. In this paper, we aim to develop a sparse attack for understanding the vulnerability of CNNs by minimizing the magnitude of initial perturbations under the l0 constraint, to overcome the existing drawbacks while achieving a fast, transferable, and strong attack to DNNs. In particular, a novel and theoretical sound parameterization technique is introduced to approximate the NP-hard l0 optimization problem, making directly optimizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
