How Learning Dynamics Drive Adversarially Robust Generalization?
Yuelin Xu, Xiao Zhang

TL;DR
This paper develops a PAC-Bayesian framework to analyze adversarial training dynamics, providing mechanistic insights into robust overfitting and the effects of weight perturbations on generalization.
Contribution
It introduces a novel dynamical systems perspective and theoretical bounds that connect learning rate, loss landscape geometry, and stochastic gradients to robust generalization.
Findings
Robust overfitting can be explained through the evolution of posterior mean and covariance.
Adversarial weight perturbation reduces the generalization gap by affecting loss curvature.
The framework offers a mechanistic understanding of adversarial training dynamics.
Abstract
Despite being widely adopted as a canonical framework for learning robust models, adversarial training suffers from robust overfitting. Existing empirical measures and theoretical explorations are insufficient to provide satisfying mechanistic insights into the phenomenon. By viewing adversarial training with momentum SGD as a discrete-time dynamical system, we introduce a PAC-Bayesian analytical framework that proves time-resolved robust generalization bounds. Specifically, our framework tracks the closed-form evolution of the posterior mean and covariance under both stationary and non-stationary transient regimes, revealing their connections to the learning rate, the geometry of the loss landscape, and mini-batch stochastic gradients. By empirically approximating the statistical quantities implied by our theory, we offer a unified, mechanistic explanation for robust overfitting. We…
Peer Reviews
Decision·Submitted to ICLR 2026
++ The paper is generally well-written and the overall framework is not difficult to follow. ++ Under the second-order approximation, it is novel to derive the closed-form posterior covariance under both constant learning rate regime and learning rate decay regime. ++ The analysis is relatively comprehensive: it considers many factors that may affect generalization error, including the momentum mechanism, the gradient noise, the Hessian structure and the learning rate. The theoretical analyses
1. The major concerns are restrictive assumptions: * Assumption 3.3: I do not think the posterior distribution after adversarial training for general deep neural networks is a Gaussian distribution. Probably, the authors can assume that the posterior distribution is a mixture of several super Gaussian distributions, as the probabilistic density will generally concentrate during training, and different initialisations will lead to converged parameters near different local minima. *
The paper is written extremely clearly, and the development is very logical. I genuinely enjoyed reading the paper. The results are nice and insightful. I am not close enough to the literature to evaluate how different they are from existing results, but they are interesting and the approach is well-motivated.
I see how the proposed Gaussian model is in fact less restrictive than models in previous work, but I am curious if the authors can comment on its limitations. I also have a few additional questions that are in the section below.
The paper approaches robust generalization from an appealing dynamic perspective—focusing on how the optimization trajectory and posterior evolution influence generalization—rather than adopting a static hypothesis-space view based on Rademacher complexity or other capacity measures. The connection between posterior evolution and curvature-based generalization offers an interesting conceptual direction, potentially bridging PAC-Bayesian analysis and training dynamics.
1. Unclear contribution and novelty. The paper does not clearly describe its theoretical novelty compare to prior PAC-Bayesian analyses of adversarial robustness (e.g., [Viallard et al., 2021](https://proceedings.neurips.cc/paper/2021/file/78e8dffe65a2898eef68a33b8db35b78-Paper.pdf); [Mustafa et al., 2019](https://ml.cs.rptu.de/publications/2023/computing_non_vacuous_pac_bayes.pdf) ; Xiao et al., 2023). It is unclear how the presented bounds improve the previous results. 2. Concerns regarding
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
