Second-order step-size tuning of SGD for non-convex optimization

Camille Castera; J\'er\^ome Bolte; C\'edric F\'evotte; Edouard Pauwels

arXiv:2103.03570·cs.LG·February 10, 2022

Second-order step-size tuning of SGD for non-convex optimization

Camille Castera, J\'er\^ome Bolte, C\'edric F\'evotte, Edouard Pauwels

PDF

Open Access 1 Repo

TL;DR

This paper introduces Step-Tuned SGD, a second-order step-size adaptation method for non-convex optimization that improves training efficiency and accuracy in deep learning by estimating curvature with local quadratic models.

Contribution

It proposes a novel second-order step-size tuning method for SGD using local curvature estimation, enhancing convergence and performance in deep neural network training.

Findings

01

Faster convergence to critical points.

02

Better test accuracy compared to SGD, RMSprop, ADAM.

03

Observed loss drops during training stages.

Abstract

In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss and an improvement of test accuracy at medium stages, yielding better results than SGD, RMSprop, or ADAM.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Abdoulaye-Koroko/Second-order-step-size-tuning-of-SGD-for-non-convex-optimization
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent