Eva: A General Vectorized Approximation Framework for Second-order   Optimization

Lin Zhang; Shaohuai Shi; Bo Li

arXiv:2308.02123·cs.LG·August 7, 2023

Eva: A General Vectorized Approximation Framework for Second-order Optimization

Lin Zhang, Shaohuai Shi, Bo Li

PDF

Open Access

TL;DR

Eva introduces a memory- and time-efficient second-order optimization framework that leverages Kronecker factorization and Sherman-Morrison updates, significantly accelerating deep learning training while maintaining convergence.

Contribution

The paper proposes a novel vectorized approximation framework for second-order optimization, improving efficiency of existing algorithms without sacrificing convergence.

Findings

01

Eva reduces training time up to 2.05x compared to SGD.

02

Eva reduces training time up to 2.42x compared to K-FAC and Shampoo.

03

The framework maintains comparable convergence performance.

Abstract

Second-order optimization algorithms exhibit excellent convergence properties for training deep learning models, but often incur significant computation and memory overheads. This can result in lower training efficiency than the first-order counterparts such as stochastic gradient descent (SGD). In this work, we present a memory- and time-efficient second-order algorithm named Eva with two novel techniques: 1) we construct the second-order information with the Kronecker factorization of small stochastic vectors over a mini-batch of training data to reduce memory consumption, and 2) we derive an efficient update formula without explicitly computing the inverse of matrices using the Sherman-Morrison formula. We further extend Eva to a general vectorized approximation framework to improve the compute and memory efficiency of two existing second-order algorithms (FOOF and Shampoo) without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications

MethodsStochastic Gradient Descent