Information-Theoretic Generalization Bounds for Stochastic Gradient Descent with Predictable Virtual Noise
Mohammad Partohaghighi

TL;DR
This paper develops new information-theoretic generalization bounds for SGD that adapt to the optimization history, allowing for more accurate and flexible analysis of the learning process.
Contribution
It introduces predictable history-adaptive virtual perturbations, enabling bounds that incorporate pathwise information and adaptive geometries without altering SGD.
Findings
Bounds include adaptive covariance and output-sensitivity penalties.
The framework recovers fixed-noise bounds as special cases.
Adaptive virtual noise improves the analysis of geometry-aware SGD.
Abstract
Information-theoretic generalization bounds analyze stochastic optimization by relating expected generalization error to the mutual information between learned parameters and training data. Virtual perturbation analyses of SGD add auxiliary Gaussian noise only in the proof, making mutual information tractable while leaving the actual SGD trajectory unchanged. Existing bounds, however, typically require perturbation covariances to be fixed independently of the optimization history, limiting their ability to represent geometries induced by moving gradient statistics, preconditioners, curvature proxies, and other pathwise information. We introduce predictable history-adaptive virtual perturbations, where the perturbation covariance at each iteration may depend on the past real SGD history but not on current or future randomness. This predictability enables a conditional Gaussian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
