Estimating the Hessian by Back-propagating Curvature
James Martens (University of Toronto), Ilya Sutskever (University of, Toronto), Kevin Swersky (University of Toronto)

TL;DR
This paper introduces Curvature Propagation (CP), a method for efficiently estimating the Hessian matrix in neural networks, enabling scalable and accurate curvature computations crucial for optimization and score matching.
Contribution
The paper presents a novel unbiased Hessian approximation technique called Curvature Propagation that is efficient, scalable, and applicable to complex models like neural networks.
Findings
CP provides accurate Hessian estimates with roughly two gradient evaluations.
CP effectively estimates the Hessian diagonal in neural networks.
CP improves scalability of Hessian computations in score matching.
Abstract
In this work we develop Curvature Propagation (CP), a general technique for efficiently computing unbiased approximations of the Hessian of any function that is computed using a computational graph. At the cost of roughly two gradient evaluations, CP can give a rank-1 approximation of the whole Hessian, and can be repeatedly applied to give increasingly precise unbiased estimates of any or all of the entries of the Hessian. Of particular interest is the diagonal of the Hessian, for which no general approach is known to exist that is both efficient and accurate. We show in experiments that CP turns out to work well in practice, giving very accurate estimates of the Hessian of neural networks, for example, with a relatively small amount of work. We also apply CP to Score Matching, where a diagonal of a Hessian plays an integral role in the Score Matching objective, and where it is usually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
