SAIF: Sparse Adversarial and Imperceptible Attack Framework

Tooba Imtiaz; Morgan Kohler; Jared Miller; Zifeng Wang; Masih Eskandar; Mario Sznaier; Octavia Camps; Jennifer Dy

arXiv:2212.07495·cs.CV·September 16, 2025

SAIF: Sparse Adversarial and Imperceptible Attack Framework

Tooba Imtiaz, Morgan Kohler, Jared Miller, Zifeng Wang, Masih Eskandar, Mario Sznaier, Octavia Camps, Jennifer Dy

PDF

Open Access 3 Reviews

TL;DR

SAIF introduces a novel sparse and imperceptible adversarial attack method that effectively reveals neural network vulnerabilities by optimizing low-magnitude, sparse perturbations using the Frank-Wolfe algorithm, outperforming existing methods.

Contribution

The paper proposes SAIF, a new attack framework that generates highly imperceptible, sparse adversarial examples using a novel optimization approach, revealing classifier vulnerabilities more effectively.

Findings

01

SAIF produces highly imperceptible adversarial examples.

02

SAIF outperforms state-of-the-art sparse attack methods on ImageNet.

03

The method demonstrates effective vulnerability analysis of neural classifiers.

Abstract

Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O (1/ T)$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

* The use of the FW algorithm for crafting stealthy adversarial examples is novel and intuitive. * Within the proposed threat model, the evaluation results are strong.

Weaknesses

* I did not find a practical motivation behind the proposed threat model. Sparse and low-magnitude attacks can indeed be hard to detect. However, to create the SAIF attacks, the attacker needs unrealistic powers: (i) it has white-box access to the model parameters, (ii) it knows the exact inputs processed by the model in advance, and (iii) it can also make the target model perceive the examples perfectly. The runtime of the SAIF algorithm is also slow so that makes real-time attack generation di

Reviewer 02Rating 3· reject, not good enoughConfidence 5

Strengths

- The proposed algorithm to simultaneously optimize the dense perturbation and sparse mask seems overall novel. - in the experiments, SAIF is shown to outperform existing attacks on both targeted and untargeted attacks.

Weaknesses

- It is not clear how the $\ell_0$-norm constrain is enforced: in Line 7 of Alg. 1, the sparsity of $s_t$ seems to decrease compared to $s_{t-1}$ (unless $\eta_t=1$ or $s_{t-1}$ and $z_t$ have the non-zero components and the same position). Moreover, the presentation in general could be more clear, e.g. some notation is used before being introduced. - The case with $\epsilon=255$ is equivalent to a standard $\ell_0$-attack: then, I think a comparison to existing attacks (e.g. [A, B, C]) in this

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

The authors use an elegant Frank-Wolfe algorithm for the simultaneous optimization of adversarial perturbations. An $\ell_1$ convex surrogate is used for sparsity. SAIF demonstrates a sound technical approach, achieving a rapid convergence rate and ensuring the generation of adversarial examples with bounded magnitude and sparsity. The SAIF method outperforms state-of-the-art sparse attack methods on major datasets like ImageNet and CIFAR-10, especially when fewer pixels are allowed to be pertur

Weaknesses

* The $O(1/\sqrt{T})$ convergence analysis is very interesting. But I was wondering how this number reflects the real running time in practice. For example, what are the convergence rates for existing methods in Table 4? * The models tested are un-defended. The authors may need to test some robust models. Even the 'regular' robust models adversarially trained with small $\epsilon$ values and without sparsity constraints are helpful. Another way is to train the model with SAIF on the fly, to see

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI