A Novel Structured Natural Gradient Descent for Deep Learning

Weihua Liu; Xiabi Liu

arXiv:2109.10100·cs.LG·September 22, 2021

A Novel Structured Natural Gradient Descent for Deep Learning

Weihua Liu, Xiabi Liu

PDF

Open Access

TL;DR

This paper introduces a new structured natural gradient descent method that reconstructs neural networks to approximate natural gradient optimization, improving convergence and performance while maintaining computational efficiency.

Contribution

It proposes reconstructing neural network structures to emulate natural gradient descent, offering a practical alternative that enhances training speed and accuracy.

Findings

01

Accelerates convergence of deep networks

02

Achieves better performance than traditional gradient descent

03

Maintains computational simplicity

Abstract

Natural gradient descent (NGD) provided deep insights and powerful tools to deep neural networks. However the computation of Fisher information matrix becomes more and more difficult as the network structure turns large and complex. This paper proposes a new optimization method whose main idea is to accurately replace the natural gradient optimization by reconstructing the network. More specifically, we reconstruct the structure of the deep neural network, and optimize the new network using traditional gradient descent (GD). The reconstructed network achieves the effect of the optimization way with natural gradient descent. Experimental results show that our optimization method can accelerate the convergence of deep network models and achieve better performance than GD while sharing its computational simplicity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Blind Source Separation Techniques · Domain Adaptation and Few-Shot Learning