A Robust Adaptive Stochastic Gradient Method for Deep Learning

Caglar Gulcehre; Jose Sotelo; Marcin Moczulski; Yoshua Bengio

arXiv:1703.00788·cs.LG·March 3, 2017·2 cites

A Robust Adaptive Stochastic Gradient Method for Deep Learning

Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new adaptive stochastic gradient method that uses curvature information and variance reduction to improve convergence and performance in deep learning models.

Contribution

It proposes an innovative adaptive learning rate algorithm leveraging stochastic curvature estimates and a novel variance reduction technique for enhanced deep learning optimization.

Findings

01

Achieved better performance than popular SGD variants in neural network training.

02

Demonstrated faster convergence with the proposed adaptive method.

03

Improved robustness to noise in stochastic gradient estimates.

Abstract

Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to important successes in the recent advancement of the deep learning algorithms. The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. In this paper, we propose an adaptive learning rate algorithm, which utilizes stochastic curvature information of the loss function for automatically tuning the learning rates. The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients. We further propose a new variance reduction technique to speed up the convergence. In our experiments with deep neural networks, we obtained better performance compared to the popular stochastic gradient algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sotelo/scribe
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Machine Learning and ELM

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent