Faster Adaptive Optimization via Expected Gradient Outer Product Reparameterization
Adela DePavia, Jose Cruzado, Jiayou Liang, Vasileios Charisopoulos, Rebecca Willett

TL;DR
This paper introduces an orthonormal reparameterization method based on the expected gradient outer product (EGOP) matrix to improve the convergence of adaptive optimization algorithms like Adam, especially for data with spectral decay.
Contribution
It proposes a novel EGOP-based reparameterization technique that enhances adaptive optimizers' performance by addressing their sensitivity to parameterization.
Findings
EGOP reparameterization improves convergence in practice.
Spectral decay of EGOP correlates with optimization sensitivity.
Theoretical analysis supports empirical results.
Abstract
Adaptive optimization algorithms -- such as Adagrad, Adam, and their variants -- have found widespread use in machine learning, signal processing and many other settings. Several methods in this family are not rotationally equivariant, meaning that simple reparameterizations (i.e. change of basis) can drastically affect their convergence. However, their sensitivity to the choice of parameterization has not been systematically studied; it is not clear how to identify a "favorable" change of basis in which these methods perform best. In this paper we propose a reparameterization method and demonstrate both theoretically and empirically its potential to improve their convergence behavior. Our method is an orthonormal transformation based on the expected gradient outer product (EGOP) matrix, which can be approximated using either full-batch or stochastic gradient oracles. We show that for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image Processing Techniques and Applications · Advanced Computing and Algorithms
MethodsAdam
