Training (Overparametrized) Neural Networks in Near-Linear Time

Jan van den Brand; Binghui Peng; Zhao Song; Omri Weinstein

arXiv:2006.11648·cs.LG·December 10, 2020

Training (Overparametrized) Neural Networks in Near-Linear Time

Jan van den Brand, Binghui Peng, Zhao Song, Omri Weinstein

PDF

1 Video

TL;DR

This paper introduces a near-linear time second-order optimization algorithm for training overparametrized neural networks, significantly reducing computational costs and leveraging randomized linear algebra techniques.

Contribution

It develops a fast, near-linear time second-order training method for neural networks by reformulating Gauss-Newton iterations and applying dimension reduction techniques.

Findings

01

Achieves rom O(mn^2) to rom O(mn) in training time

02

Reformulates Gauss-Newton as an or efficient computation

03

Demonstrates applicability of randomized linear algebra in deep learning

Abstract

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster $second$ - $order$ optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate ( $independent$ of the training batch size $n$ ), second-order algorithms incur a daunting slowdown in the $cost$ $per$ $iteration$ (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19,CGH+19}, yielding an $O (m n^{2})$ -time second-order algorithm for training two-layer overparametrized neural networks of polynomial width $m$ . We show how to speed up the algorithm of [CGH+19], achieving an $\tilde{O} (mn)$ -time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Training (Overparametrized) Neural Networks in Near-Linear Time· youtube

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent · *Communicated@Fast*How Do I Communicate to Expedia?