Diagonal Rescaling For Neural Networks

Jean Lafond; Nicolas Vasilache; L\'eon Bottou

arXiv:1705.09319·cs.LG·May 29, 2017·6 cites

Diagonal Rescaling For Neural Networks

Jean Lafond, Nicolas Vasilache, L\'eon Bottou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a second-order stochastic gradient algorithm for neural networks that normalizes activations and offers new insights into stepsize scaling and curvature management, enhancing training robustness.

Contribution

It proposes a novel second-order algorithm with a block-diagonal structure that normalizes activations and provides new understanding of stepsize scaling and curvature adaptation.

Findings

01

Clarifies the role of stepsize scaling in popular algorithms

02

Highlights the importance of handling rapid curvature changes

03

Connects old tricks with modern normalization techniques

Abstract

We define a second-order neural network stochastic gradient training algorithm whose block-diagonal structure effectively amounts to normalizing the unit activations. Investigating why this algorithm lacks in robustness then reveals two interesting insights. The first insight suggests a new way to scale the stepsizes, clarifying popular algorithms such as RMSProp as well as old neural network tricks such as fanin stepsize scaling. The second insight stresses the practical importance of dealing with fast changes of the curvature of the cost.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Thrandis/EKFAC-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and Algorithms

MethodsRMSProp