Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing
Xuyang Zhong, Yixiao Huang, Chen Liu

TL;DR
This paper investigates the difficulties of fast adversarial training against sparse $l_0$ attacks, identifies loss landscape cragginess as a cause of overfitting, and proposes a smoothing method to improve robustness and performance.
Contribution
It introduces a loss smoothing technique with soft labels and a trade-off loss to mitigate catastrophic overfitting in $l_0$ adversarial training.
Findings
The loss landscape in $l_0$ adversarial training is more craggy than in other norms.
Loss landscape cragginess contributes to catastrophic overfitting.
The proposed Fast-LS-$l_0$ method achieves state-of-the-art robustness against sparse attacks.
Abstract
This paper studies fast adversarial training against sparse adversarial perturbations bounded by norm. We demonstrate the challenges of employing -step attacks on bounded perturbations for fast adversarial training, including degraded performance and the occurrence of catastrophic overfitting (CO). We highlight that CO in adversarial training is caused by sub-optimal perturbation locations of -step attack. Theoretical and empirical analyses reveal that the loss landscape of adversarial training is more craggy compared to its , and counterparts. Moreover, we corroborate that the craggy loss landscape can aggravate CO. To address these issues, we propose Fast-LS- that incorporates soft labels and the trade-off loss function to smooth the adversarial loss landscape. Extensive experiments demonstrate our method can overcome the…
Peer Reviews
Decision·Submitted to ICLR 2025
1. Research on FAT under $L_0$ norm constraints is relatively limited, and the authors address this gap by proposing an innovative method to mitigate performance degradation and catastrophic overfitting in FAT. 2. The authors provide detailed theoretical insights into the connection between the non-smooth adversarial loss landscape and catastrophic overfitting, giving clarity to the existing issues from the theoretical perspective. 3. The paper is well-organized and clearly explained, and exte
1. The novelty is somewhat limited, as many techniques, including soft labels and TRADES loss, are widely used in adversarial defense. The overall method may appear as a straightforward integration of existing techniques. 2. The application scenario is not clearly defined, particularly why robustness under the $L_0$ norm constraint is essential. Can this adversarial training effectively enhance defense against one-pixel attacks? 3. The application scenario is unclear, and it is unclear why the m
1. The presentation is easy to follow 2. The proposed method is well-motivated. 3. The experimental results verify the effectiveness of the proposed method in mitigating catastrophic overfitting in $\ell_0$ fast AT.
1. The studied problem, catastrophic overfitting in $\ell_0$ fast AT, is quite narrow. It considers a very specific case, $\ell_0$, of the $\ell_p$ adversarial setting. The impact of this work's conclusions on the entire field, adversarial machine learning, is therefore limited. 2. The analytical framework and the findings are similar to the existing works of analyzing overfitting in AT. Some of them are already cited in this work, while some else are missing. For example, [1] also attributes ov
This work is the first to investigate fast adversarial training in the context of $L_0$ bounded perturbations. The authors successfully demonstrate that the CO issue in the $L_0$ norm is caused by sub-optimal perturbation locations, rather than sub-optimal perturbation magnitudes via some interesting ablation studies. This study conducts extensive experiments, including ImageNet and Transformer-based architectures.
Theoretical analysis indicates that large $\left\|\boldsymbol{\delta}_1-\boldsymbol{\delta}_2\right\|$ can intensify the gradient discontinuity, and $L_0$ norm has the largest upper bound. However, directly comparing these upper bounds may not be a fair comparison due to the naturally larger freedom in change magnitudes associated with the $L_0$ norm. Could authors provide some empirical results of $\left\|\boldsymbol{\delta}_1-\boldsymbol{\delta}_2\right\|$ among different norms to support
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis
