On-Average Stability of Multipass Preconditioned SGD and Effective Dimension
Simon Vary, Tyler Farghly, Ilja Kuzborskij, Patrick Rebeschini

TL;DR
This paper analyzes how preconditioning in multipass SGD affects generalization, revealing the importance of effective dimension and stability, and providing new bounds for optimization and generalization performance.
Contribution
We develop the first on-average stability analysis for multipass PSGD accounting for data reuse correlations, linking generalization to effective dimension.
Findings
Proper preconditioning improves effective dimension and generalization.
Improper preconditioning can lead to suboptimal statistical behavior.
Matching lower bounds confirm the tightness of our bounds.
Abstract
We study trade-offs between the population risk curvature, geometry of the noise, and preconditioning on the generalisation ability of the multipass Preconditioned Stochastic Gradient Descent (PSGD). Many practical optimisation heuristics implicitly navigate this trade-off in different ways -- for instance, some aim to whiten gradient noise, while others aim to align updates with expected loss curvature. When the geometry of the population risk curvature and the geometry of the gradient noise do not match, an aggressive choice that improves one aspect can amplify instability along the other, leading to suboptimal statistical behavior. In this paper we employ on-average algorithmic stability to connect generalisation of PSGD to the effective dimension that depends on these sources of curvature. While existing techniques for on-average stability of SGD are limited to a single pass, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Risk and Portfolio Optimization
