Eva: A General Vectorized Approximation Framework for Second-order Optimization
Lin Zhang, Shaohuai Shi, Bo Li

TL;DR
Eva introduces a memory- and time-efficient second-order optimization framework that leverages Kronecker factorization and Sherman-Morrison updates, significantly accelerating deep learning training while maintaining convergence.
Contribution
The paper proposes a novel vectorized approximation framework for second-order optimization, improving efficiency of existing algorithms without sacrificing convergence.
Findings
Eva reduces training time up to 2.05x compared to SGD.
Eva reduces training time up to 2.42x compared to K-FAC and Shampoo.
The framework maintains comparable convergence performance.
Abstract
Second-order optimization algorithms exhibit excellent convergence properties for training deep learning models, but often incur significant computation and memory overheads. This can result in lower training efficiency than the first-order counterparts such as stochastic gradient descent (SGD). In this work, we present a memory- and time-efficient second-order algorithm named Eva with two novel techniques: 1) we construct the second-order information with the Kronecker factorization of small stochastic vectors over a mini-batch of training data to reduce memory consumption, and 2) we derive an efficient update formula without explicitly computing the inverse of matrices using the Sherman-Morrison formula. We further extend Eva to a general vectorized approximation framework to improve the compute and memory efficiency of two existing second-order algorithms (FOOF and Shampoo) without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications
MethodsStochastic Gradient Descent
