Noise Stability of Transformer Models
Themistoklis Haris, Zihan Zhang, Yuichi Yoshida

TL;DR
This paper introduces noise stability as a new metric for assessing the simplicity and robustness of transformer models, addressing limitations of average sensitivity, and demonstrates its effectiveness in accelerating training and enhancing interpretability.
Contribution
It proposes noise stability as a comprehensive simplicity metric, develops a theoretical framework for it, and introduces a practical regularization method that improves training efficiency.
Findings
Regularizer accelerates training by ~75%.
Enhances model interpretability and robustness.
Provides theoretical analysis for attention and MLP layers.
Abstract
Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this…
Peer Reviews
Decision·ICLR 2026 Poster
Stability analysis of attention with identity, low rank, and random regimes is new and insightful. The appearance of a permutation consistency factor $s(\rho)$ in the unstructured case is a neat and interpretable result. Theoretical results are derived from a solid OU semigroup foundation with explicit ReLU formulas and careful discussion of approximations. Interval propagation addresses the realistic case where exact recurrences are brittle. The regularizer is minimal and differentiable, with a
- Assumption gaps for deep Transformers. The multi layer recurrence assumes fixed distributions and idealized conditions. The authors later observe full dampening in practice when $\gamma < 1$, which highlights a gap between proxy analysis and actual behavior. More systematic experiments that probe when interval bounds are tight would strengthen claims. - Limited task scope. The regularizer is only evaluated on synthetic algorithmic tasks. Evidence that it helps on language modeling or fine tun
**Substantive theoretical contributions.** The paper gives an exact, closed-form expression for the noise stability of a ReLU layer (Theorem 5.1 / Appendix E) and analyzes single-layer attention under different structural regimes (identity/low-rank and unstructured; Theorems 5.2 and 5.3). It also studies multi-layer propagation, showing weak dampening for deep ReLU MLPs and contrasting behavior for Transformers. **Thorough and self-contained presentation.** The work includes background on Boole
**Limited comparison with prior works.** Section 3 argues that geometric influence (GI) is a more robust alternative to average sensitivity in continuous domains, but then pivots to advocating noise stability as the primary metric without a direct empirical or theoretical head-to-head comparison (e.g., GI vs. noise stability on the same tasks/models). Related work cites Li & Mossel (2025) for noise sensitivity propagation but does not clearly mention what is new beyond the inspiration or compare
1. The mathematical formulation of noise stability through the Ornstein–Uhlenbeck operator is well presented. The concept of analyzing robustness under global, correlated perturbations rather than local ones is intuitive and sound. 2. The introduction of a differentiable noise-stability regularizer is a concrete and practical contribution. 3. The choice of algorithmic tasks is appropriate for studying grokking phenomena. The experiments show correlations between noise stability, validation loss,
1. The paper motivates the use of noise stability over average sensitivity based on the observed “junta-like” dependence in Transformers, arguing that noise stability captures global correlated perturbations rather than single-coordinate ones. However, the connection between robustness to global perturbations and dependence on a few input variables is not clearly established. It remains unclear why capturing correlated noise should correspond to fewer influential tokens. 2. The authors claim tha
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
