Incorruptible Neural Networks: Training Models that can Generalize to Large Internal Perturbations
Philip Jacobson, Ben Feinberg, Suhas Kumar, Sapan Agarwal, T. Patrick Xiao, Christopher Bennett

TL;DR
This paper investigates training neural networks that can maintain performance despite large internal weight perturbations, using methods like SAM and RWP, and introduces techniques for improved robustness and optimization under such conditions.
Contribution
The paper presents a theoretical and empirical analysis of RWP and SAM for noise-robust training, and proposes dynamic perturbation adjustment to enhance robustness and optimization.
Findings
Over-regularized RWP is optimal for noise-robust generalization.
SAM improves performance for small noise but struggles with large noise due to vanishing gradients.
Dynamic perturbation adjustment enhances optimization for perturbed objectives.
Abstract
Flat regions of the neural network loss landscape have long been hypothesized to correlate with better generalization properties. A closely related but distinct problem is training models that are robust to internal perturbations to their weights, which may be an important need for future low-power hardware platforms. In this paper, we explore the usage of two methods, sharpness-aware minimization (SAM) and random-weight perturbation (RWP), to find minima robust to a variety of random corruptions to weights. We consider the problem from two angles: generalization (how do we reduce the noise-robust generalization gap) and optimization (how do we maximize performance from optimizers when subject to strong perturbations). First, we establish, both theoretically and empirically, that an over-regularized RWP training objective is optimal for noise-robust generalization. For small-magnitude…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
**S1.** Investigating the impact of weight perturbations at test time is of significant interest, as it represents a practical challenge in real-world systems affected by hardware noise. **S2.** The authors conduct extensive experiments analyzing the loss landscape, sharpness, backbones, and the effects of varying $\sigma$ values.
**W1.** Since model parameters are often not externally accessible, I think the premise of manually perturbing weights has limited practical relevance. However, I agree with the argument in the introduction that hardware can inject noise into model weights. Therefore, investigating robustness against real hardware noise patterns is crucial. In this work, the perturbations are distributed as zero-mean isotropic Gaussian. How this assumption aligns with real-world noise patterns? **W2.** As shown
The demonstrations in Figures 3 and 4 are clear and informative. They effectively illustrate how the loss, gradient norm, and sharpness evolve during training, revealing how SAM and RWP interact with the geometry of the loss surface. The experiments are well-controlled, with systematic variations in hyperparameters to isolate specific effects. These analyses lead to meaningful conclusions about the vanishing-gradient phenomenon in SAM and the differing robustness characteristics of SAM and RWP.
1. The motivation for this work is not clearly articulated. As stated on page 1, paragraph 2, the authors refer to analog in-memory computing (AIMC) and “hardware errors” as the practical motivation. However, the discussion remains vague. The paper briefly claims that prior works lack “a broad understanding of noise-robustness in neural networks” and “a connection of these efforts to existing flatness-finding approaches such as SAM and RWP,” but these statements are overly general and do not con
It presents in-depth comparisons between SAM and RWP, providing more extensive evaluations on top of the existing findings.
1. This work seems an empirical study, where no theoretical analyses or analytical discussions are provided, making limited contributions to the fundamentals in this topic. 2. As being positioned from a rather empirical perspective, it would be more solid to expand the evaluations, i.e., Transformer-based architectures, tasks/datasets beyond imaging processing, and so on. 3. By intuitive speculations, results from sec 4.2 and 4.3 are not difficult to conceptualize. Despite the observations pr
1. It investigates the model's robustness to weight perturbations, which corresponds to the flatness of the weight loss landscape during the test phase, distinguishing it from previous research on SAM and RWP. 2. The paper is clearly written, the figures and tables are easy to understand, and the overall flow of the text is good. 3. The experiments in this paper are comprehensive. The paper conducts a systematic evaluation of SAM and RWP under various noise settings, and its proposed dynamic
1. The motivation is ambiguous. This paper associates the model's robustness to weight perturbations with AIMC hardware errors, which is a relatively novel scenario. However, it is not clear that the methods and experiments in this paper can be reliably transferred to real AIMC hardware. 2. The practical value of this article is unclear. The paper does not clearly demonstrate the effectiveness of its proposed method in real-world hardware deployment scenarios, such as evaluation using a target
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
