Designing Preconditioners for SGD: Local Conditioning, Noise Floors, and Basin Stability
Mitchell Scott, Tianshi Xu, Ziyuan Tang, Alexandra Pichette-Emmons, Qiang Ye, Yousef Saad, and Yuanzhe Xi

TL;DR
This paper analyzes how preconditioning affects the convergence, noise floor, and basin stability of stochastic gradient descent, providing a framework for designing effective preconditioners in scientific machine learning.
Contribution
It introduces a theoretical framework linking preconditioner choice to convergence rate, noise floor, and basin stability in SGD, applicable to various preconditioners and validated experimentally.
Findings
Preconditioning improves local conditioning and reduces noise.
The convergence rate and noise floor depend on the effective condition number.
Experiments confirm the theoretical rate-floor behavior.
Abstract
Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix , deriving bounds in which both the convergence rate and the stochastic noise floor are governed by -dependent quantities: the rate through an effective condition number in the -metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the -norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Markov Chains and Monte Carlo Methods
