TL;DR
This paper introduces a near-linear time second-order optimization algorithm for training overparametrized neural networks, significantly reducing computational costs and leveraging randomized linear algebra techniques.
Contribution
It develops a fast, near-linear time second-order training method for neural networks by reformulating Gauss-Newton iterations and applying dimension reduction techniques.
Findings
Achieves rom O(mn^2) to rom O(mn) in training time
Reformulates Gauss-Newton as an or efficient computation
Demonstrates applicability of randomized linear algebra in deep learning
Abstract
The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster - optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate ( of the training batch size ), second-order algorithms incur a daunting slowdown in the (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19,CGH+19}, yielding an -time second-order algorithm for training two-layer overparametrized neural networks of polynomial width . We show how to speed up the algorithm of [CGH+19], achieving an -time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Training (Overparametrized) Neural Networks in Near-Linear Time· youtube
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent · *Communicated@Fast*How Do I Communicate to Expedia?
