Accelerating SGD for Distributed Deep-Learning Using Approximated   Hessian Matrix

S\'ebastien M. R. Arnold; Chunming Wang

arXiv:1709.05069·cs.LG·September 18, 2017

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

S\'ebastien M. R. Arnold, Chunming Wang

PDF

Open Access

TL;DR

This paper proposes a distributed method to approximate the inverse Hessian matrix using gradient differences, enabling second-order optimization techniques to accelerate stochastic gradient descent in deep learning.

Contribution

It introduces a novel distributed approach for Hessian inverse approximation, enhancing second-order optimization in large-scale deep learning.

Findings

01

Preliminary results show potential benefits of second-order methods.

02

Gradient combination strategies reveal additional information about the loss surface.

03

Challenges in implementing second-order methods at scale are identified.

Abstract

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients provide further information on the loss surface.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications