Preconditioning for Accelerated Gradient Descent Optimization and Regularization
Qiang Ye

TL;DR
This paper provides a unified theoretical framework explaining how preconditioning, normalization, and regularization techniques accelerate training in gradient descent, and explores their interactions for improved optimization.
Contribution
It offers a comprehensive analysis of acceleration methods and their interplay with regularization, including new insights into AdamW and normalization effects.
Findings
AdamW effectively selects intrinsic parameters for regularization.
Normalization methods improve Hessian conditioning, accelerating training.
Unified framework for understanding acceleration and regularization techniques.
Abstract
Accelerated training algorithms, such as adaptive learning rates (or preconditioning) and various normalization methods, are widely used but not fully understood. When regularization is introduced, standard optimizers like adaptive learning rates may not perform effectively. This raises the need for alternative regularization approaches such as AdamW and the question of how to properly combine regularization with preconditioning. In this paper, we address these challenges using the theory of preconditioning as follows: (1) We explain how AdaGrad, RMSProp, and Adam accelerates training through improving Hessian conditioning; (2) We explore the interaction between -regularization and preconditioning, demonstrating that AdamW amounts to selecting the underlying intrinsic parameters for regularization, and we derive a generalization for the -regularization; and (3) We demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiative Heat Transfer Studies · Numerical methods in inverse problems · Sparse and Compressive Sensing Techniques
MethodsAdaGrad · Adam · RMSProp
