Adaptive Learning Rate via Covariance Matrix Based Preconditioning for   Deep Neural Networks

Yasutoshi Ida; Yasuhiro Fujiwara; Sotetsu Iwamura

arXiv:1605.09593·cs.LG·September 29, 2017

Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks

Yasutoshi Ida, Yasuhiro Fujiwara, Sotetsu Iwamura

PDF

TL;DR

This paper introduces SDProp, an adaptive learning rate method that uses covariance matrix preconditioning to better handle stochastic gradient noise, improving training efficiency and effectiveness for deep neural networks.

Contribution

The paper proposes SDProp, a novel adaptive learning rate algorithm that leverages covariance matrix preconditioning to reduce noise impact in stochastic optimization.

Findings

01

SDProp outperforms RMSProp in training efficiency.

02

SDProp achieves higher accuracy on various neural networks.

03

SDProp effectively handles gradient noise in stochastic training.

Abstract

Adaptive learning rate algorithms such as RMSProp are widely used for training deep neural networks. RMSProp offers efficient training since it uses first order gradients to approximate Hessian-based preconditioning. However, since the first order gradients include noise caused by stochastic optimization, the approximation may be inaccurate. In this paper, we propose a novel adaptive learning rate algorithm called SDProp. Its key idea is effective handling of the noise by preconditioning based on covariance matrix. For various neural networks, our approach is more efficient and effective than RMSProp and its variant.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRMSProp