An iterative K-FAC algorithm for Deep Learning

Yingshi Chen

arXiv:2101.00218·cs.LG·January 5, 2021·1 cites

An iterative K-FAC algorithm for Deep Learning

Yingshi Chen

PDF

Open Access

TL;DR

This paper introduces CG-FAC, an iterative, matrix-free K-FAC algorithm that uses conjugate gradient to approximate the natural gradient, reducing time and memory complexity compared to standard K-FAC.

Contribution

It proposes a novel iterative K-FAC method that eliminates the need for explicit Fisher matrix and Kronecker factors, improving efficiency.

Findings

01

CG-FAC is matrix-free and does not require FIM or Kronecker factors.

02

The method has lower time and memory complexity than standard K-FAC.

03

CG-FAC maintains comparable accuracy in training deep neural networks.

Abstract

Kronecker-factored Approximate Curvature (K-FAC) method is a high efficiency second order optimizer for the deep learning. Its training time is less than SGD(or other first-order method) with same accuracy in many large-scale problems. The key of K-FAC is to approximates Fisher information matrix (FIM) as a block-diagonal matrix where each block is an inverse of tiny Kronecker factors. In this short note, we present CG-FAC -- an new iterative K-FAC algorithm. It uses conjugate gradient method to approximate the nature gradient. This CG-FAC method is matrix-free, that is, no need to generate the FIM matrix, also no need to generate the Kronecker factors A and G. We prove that the time and memory complexity of iterative CG-FAC is much less than that of standard K-FAC algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Face and Expression Recognition