HesScale: Scalable Computation of Hessian Diagonals
Mohamed Elsayed, A. Rupam Mahmood

TL;DR
HesScale introduces a scalable method for efficiently approximating Hessian diagonals, enabling second-order optimization in large models without significant computational overhead.
Contribution
HesScale provides a novel, computationally efficient approach to approximate Hessian diagonals, facilitating scalable second-order optimization.
Findings
HesScale achieves high approximation accuracy.
It has the same complexity as backpropagation.
Enables scalable second-order optimization.
Abstract
Second-order optimization uses curvature information about the objective function, which can help in faster convergence. However, such methods typically require expensive computation of the Hessian matrix, preventing their usage in a scalable way. The absence of efficient ways of computation drove the most widely used methods to focus on first-order approximations that do not capture the curvature information. In this paper, we develop HesScale, a scalable approach to approximating the diagonal of the Hessian matrix, to incorporate second-order information in a computationally efficient manner. We show that HesScale has the same computational complexity as backpropagation. Our results on supervised classification show that HesScale achieves high approximation accuracy, allowing for scalable and efficient second-order optimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Neural Networks and Applications
