TL;DR
This paper provides a theoretical analysis of floating-point errors in gradient computations for adversarial attacks with CE loss, revealing error patterns and proposing a new loss function to improve attack accuracy.
Contribution
It offers the first comprehensive theoretical study of numerical errors in gradient-based adversarial attacks and introduces T-MIFPE, a loss function that minimizes floating-point errors.
Findings
T-MIFPE outperforms existing loss functions in attack success rates.
Floating-point errors significantly affect gradient accuracy in adversarial attacks.
Theoretical insights reveal key error patterns and contributors like underflow and rounding.
Abstract
Gradient-based adversarial attacks using the Cross-Entropy (CE) loss often suffer from overestimation due to relative errors in gradient computation induced by floating-point arithmetic. This paper provides a rigorous theoretical analysis of these errors, conducting the first comprehensive study of floating-point computation errors in gradient-based attacks across four distinct scenarios: (i) unsuccessful untargeted attacks, (ii) successful untargeted attacks, (iii) unsuccessful targeted attacks, and (iv) successful targeted attacks. We establish theoretical foundations characterizing the behavior of relative numerical errors under different attack conditions, revealing previously unknown patterns in gradient computation instability, and identify floating-point underflow and rounding as key contributors. Building on this insight, we propose the Theoretical MIFPE (T-MIFPE) loss function,…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Provides a theoretical treatment of floating-point–induced gradient errors across multiple attack scenarios. 2. Experiments across multiple datasets and models consistently support theoretical findings. 3. Offers a generalizable framework for analyzing numerical stability in adversarial attacks.
1.Experimental improvements are minimal (mostly ~0.1%), which may limit practical impact despite theoretical justification. 2. Some theoretical assumptions (e.g., independence between gradient terms and scaling factor) are not thoroughly discussed. 3. The analysis is restricted to CE-based attacks; more complex losses or adaptive attacks remain unexplored. 4. The paper is difficult to follow.
- The paper is well-written and easy to follow - The addressed topic is relevant, as reliably estimating robustness against adversarial examples is still an open problem - The experimental findings confirm the theoretical basis, and the proposed strategy in some settings even improves AutoAttack (which is considered a state-of-the-art method)
- The paper contribution, although theoretically sound and empirically proven, is mainly limited to estimating an optimal value for an already existing method. The main issue (numerical underflow in CE loss) and the solution (MIFPE) were presented in that previous work, where additionally their authors already tried to provide a basic theoretical justification and an empirical estimate of T (which aligns with the findings provided in this paper). - The absolute runtime overhead of the additional
- The motivation is clear, and the paper provides the first theoretical analysis of the floating-point issue and why scaling logits by a factor can improve the estimation. - The analysis covers four typical attack settings (target/untarget and successful/unsuccessful). - Experiments further validate the correctness of the theory.
- Lack of some details. - The notations and equations need some explanations for better clarity and readability.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
