Revisiting Natural Gradient for Deep Networks

Razvan Pascanu; Yoshua Bengio

arXiv:1301.3584·cs.LG·February 18, 2014·ICLR·121 cites

Revisiting Natural Gradient for Deep Networks

Razvan Pascanu, Yoshua Bengio

PDF

Open Access

TL;DR

This paper reevaluates natural gradient for deep learning, connecting it with other second-order methods, exploring unlabeled data benefits, robustness, and extending it with second-order info for improved training.

Contribution

It establishes connections between natural gradient and recent second-order methods, introduces an extension with second-order information, and benchmarks the improved algorithm.

Findings

01

Natural gradient is connected to Hessian-Free, Krylov Subspace Descent, and TONGA methods.

02

Using unlabeled data can enhance generalization in natural gradient training.

03

The extended method with second-order info shows promising results in benchmarks.

Abstract

We evaluate natural gradient, an algorithm originally proposed in Amari (1997), for learning deep models. The contributions of this paper are as follows. We show the connection between natural gradient and three other recently proposed methods for training deep models: Hessian-Free (Martens, 2010), Krylov Subspace Descent (Vinyals and Povey, 2012) and TONGA (Le Roux et al., 2008). We describe how one can use unlabeled data to improve the generalization error obtained by natural gradient and empirically evaluate the robustness of the algorithm to the ordering of the training set compared to stochastic gradient descent. Finally we extend natural gradient to incorporate second order information alongside the manifold information and provide a benchmark of the new algorithm using a truncated Newton approach for inverting the metric matrix instead of using a diagonal approximation of it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques