Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing

Xuyang Zhong; Yixiao Huang; Chen Liu

arXiv:2502.21041·cs.LG·November 3, 2025

Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing

Xuyang Zhong, Yixiao Huang, Chen Liu

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the difficulties of fast adversarial training against sparse $l_0$ attacks, identifies loss landscape cragginess as a cause of overfitting, and proposes a smoothing method to improve robustness and performance.

Contribution

It introduces a loss smoothing technique with soft labels and a trade-off loss to mitigate catastrophic overfitting in $l_0$ adversarial training.

Findings

01

The loss landscape in $l_0$ adversarial training is more craggy than in other norms.

02

Loss landscape cragginess contributes to catastrophic overfitting.

03

The proposed Fast-LS-$l_0$ method achieves state-of-the-art robustness against sparse attacks.

Abstract

This paper studies fast adversarial training against sparse adversarial perturbations bounded by $l_{0}$ norm. We demonstrate the challenges of employing $1$ -step attacks on $l_{0}$ bounded perturbations for fast adversarial training, including degraded performance and the occurrence of catastrophic overfitting (CO). We highlight that CO in $l_{0}$ adversarial training is caused by sub-optimal perturbation locations of $1$ -step attack. Theoretical and empirical analyses reveal that the loss landscape of $l_{0}$ adversarial training is more craggy compared to its $l_{\infty}$ , $l_{2}$ and $l_{1}$ counterparts. Moreover, we corroborate that the craggy loss landscape can aggravate CO. To address these issues, we propose Fast-LS- $l_{0}$ that incorporates soft labels and the trade-off loss function to smooth the adversarial loss landscape. Extensive experiments demonstrate our method can overcome the…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

1. Research on FAT under $L_0$ norm constraints is relatively limited, and the authors address this gap by proposing an innovative method to mitigate performance degradation and catastrophic overfitting in FAT. 2. The authors provide detailed theoretical insights into the connection between the non-smooth adversarial loss landscape and catastrophic overfitting, giving clarity to the existing issues from the theoretical perspective. 3. The paper is well-organized and clearly explained, and exte

Weaknesses

1. The novelty is somewhat limited, as many techniques, including soft labels and TRADES loss, are widely used in adversarial defense. The overall method may appear as a straightforward integration of existing techniques. 2. The application scenario is not clearly defined, particularly why robustness under the $L_0$ norm constraint is essential. Can this adversarial training effectively enhance defense against one-pixel attacks? 3. The application scenario is unclear, and it is unclear why the m

Reviewer 02Rating 6Confidence 4

Strengths

1. The presentation is easy to follow 2. The proposed method is well-motivated. 3. The experimental results verify the effectiveness of the proposed method in mitigating catastrophic overfitting in $\ell_0$ fast AT.

Weaknesses

1. The studied problem, catastrophic overfitting in $\ell_0$ fast AT, is quite narrow. It considers a very specific case, $\ell_0$, of the $\ell_p$ adversarial setting. The impact of this work's conclusions on the entire field, adversarial machine learning, is therefore limited. 2. The analytical framework and the findings are similar to the existing works of analyzing overfitting in AT. Some of them are already cited in this work, while some else are missing. For example, [1] also attributes ov

Reviewer 03Rating 6Confidence 4

Strengths

This work is the first to investigate fast adversarial training in the context of $L_0$ bounded perturbations. The authors successfully demonstrate that the CO issue in the $L_0$ norm is caused by sub-optimal perturbation locations, rather than sub-optimal perturbation magnitudes via some interesting ablation studies. This study conducts extensive experiments, including ImageNet and Transformer-based architectures.

Weaknesses

Theoretical analysis indicates that large $\left\|\boldsymbol{\delta}_1-\boldsymbol{\delta}_2\right\|$ can intensify the gradient discontinuity, and $L_0$ norm has the largest upper bound. However, directly comparing these upper bounds may not be a fair comparison due to the naturally larger freedom in change magnitudes associated with the $L_0$ norm. Could authors provide some empirical results of $\left\|\boldsymbol{\delta}_1-\boldsymbol{\delta}_2\right\|$ among different norms to support

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis