Assessing Robustness via Score-Based Adversarial Image Generation

Marcel Kollovieh; Lukas Gosch; Marten Lienen; Yan Scholten; Leo; Schwinn; Stephan G\"unnemann

arXiv:2310.04285·cs.CV·March 5, 2025

Assessing Robustness via Score-Based Adversarial Image Generation

Marcel Kollovieh, Lukas Gosch, Marten Lienen, Yan Scholten, Leo, Schwinn, Stephan G\"unnemann

PDF

Open Access 3 Reviews

TL;DR

This paper introduces ScoreAG, a novel framework using score-based generative models to create unrestricted, semantics-preserving adversarial examples, enhancing robustness evaluation beyond traditional $\\ell_p$-norm constraints.

Contribution

ScoreAG leverages score-based generative models for unrestricted adversarial example generation, improving robustness assessments and surpassing existing attack and defense methods.

Findings

01

ScoreAG outperforms many state-of-the-art attacks and defenses.

02

It generates realistic, semantics-preserving adversarial examples.

03

Purification with ScoreAG enhances classifier robustness.

Abstract

Most adversarial attacks and defenses focus on perturbations within small $ℓ_{p}$ -norm constraints. However, $ℓ_{p}$ threat models cannot capture all relevant semantics-preserving perturbations, and hence, the scope of robustness evaluations is limited. In this work, we introduce Score-Based Adversarial Generation (ScoreAG), a novel framework that leverages the advancements in score-based generative models to generate unrestricted adversarial examples that overcome the limitations of $ℓ_{p}$ -norm constraints. Unlike traditional methods, ScoreAG maintains the core semantics of images while generating adversarial examples, either by transforming existing images or synthesizing new ones entirely from scratch. We further exploit the generative capability of ScoreAG to purify images, empirically enhancing the robustness of classifiers. Our extensive empirical evaluation demonstrates that…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. The paper is well-organized and easy to follow. 2. Conducting adversarial attacks using semantic-bounded examples offers an interesting viewpoint on robustness evaluation. 3. As demonstrated in the experimental section, the proposed framework attains promising performance across three tasks.

Weaknesses

1. Some formulas in this paper were derived incorrectly, such as equation 7 on page 4. The authors should check the methodology section to ensure correctness. 2. The authors compare the attack effectiveness of GAT with other benchmark methods in Section 4.1. Given that generation efficiency is also an essential factor in evaluating the effectiveness of adversarial attacks, it is recommended that the authors add a discussion on the efficiency of adversarial sample generation to this part of the

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

The main contribution of this paper is the introduction of Score-Based Adversarial Generation (ScoreAG) and the following three uses of it: Synthesis (Generative Adversarial Synthesis, GAS), transformation (Generative Adversarial Transformation, GAT), and purification (Generative Adversarial Purification, GAP). This paper is well-structured and the method is described clearly.

Weaknesses

This work is subject to several weaknesses that need to be addressed. 1. Although GAS aims to generate the adversarial example from scratch that would be misclassified by the classifier while preserving the semantics of the truth class, it is disappointing to see that even human fails to classify the adversarial examples. As shown in Table 4, Human accuracy on the adversarial synthetic images is only 70%, this results do not support the argument that it preserves the semantics of a certain cla

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

The paper is well-written and the proposed method is novel. Generating semantic-preserving adversarial examples beyond standard threat models is an important research direction, and the use of score-based diffusion models is a promising approach. The experiments are extensive, comparing ScoreAG to several state-of-the-art attacks and defenses across multiple datasets. The results demonstrate ScoreAG's effectiveness in crafting unrestricted adversarial examples.

Weaknesses

- The notion of "unrestricted" adversarial examples needs more discussion. While ScoreAG does not use an explicit lp norm, the samples are still constrained to the manifold learned by the generative model. Analyzing the diversity/range of examples is important. - More analysis is needed on why ScoreAG outperforms the other diffusion-based attack DiffAttack. The reasons are not fully clear. - The lack of certified or provable robustness guarantees for the purified models is a limitation. Evalua

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

MethodsFocus