Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix
S\'ebastien M. R. Arnold, Chunming Wang

TL;DR
This paper proposes a distributed method to approximate the inverse Hessian matrix using gradient differences, enabling second-order optimization techniques to accelerate stochastic gradient descent in deep learning.
Contribution
It introduces a novel distributed approach for Hessian inverse approximation, enhancing second-order optimization in large-scale deep learning.
Findings
Preliminary results show potential benefits of second-order methods.
Gradient combination strategies reveal additional information about the loss surface.
Challenges in implementing second-order methods at scale are identified.
Abstract
We introduce a novel method to compute a rank approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients provide further information on the loss surface.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Neural Network Applications
