Kronecker-factored Quasi-Newton Methods for Deep Learning
Yi Ren, Achraf Bahamou, Donald Goldfarb

TL;DR
This paper extends Kronecker-factored quasi-Newton methods to convolutional neural networks, improving efficiency and convergence, and demonstrates superior performance over first-order methods in deep learning tasks.
Contribution
It introduces a Kronecker-factored Hessian approximation for CNNs, with improved memory and time efficiency, and provides convergence analysis and empirical validation.
Findings
Outperforms first-order SOTA methods in CNN and MLP autoencoder tasks.
Achieves comparable results to second-order SOTA methods.
Reduces memory and per-iteration time complexity.
Abstract
Second-order methods have the capability of accelerating optimization by using much richer curvature information than first-order methods. However, most are impractical for deep learning, where the number of training parameters is huge. In Goldfarb et al. (2020), practical quasi-Newton methods were proposed that approximate the Hessian of a multilayer perceptron (MLP) model by a layer-wise block diagonal matrix where each layer's block is further approximated by a Kronecker product corresponding to the structure of the Hessian restricted to that layer. Here, we extend these methods to enable them to be applied to convolutional neural networks (CNNs), by analyzing the Kronecker-factored structure of the Hessian matrix of convolutional layers. Several improvements to the methods in Goldfarb et al. (2020) are also proposed that can be applied to both MLPs and CNNs. These new methods have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Neural Networks and Applications · Matrix Theory and Algorithms
