Painless step size adaptation for SGD

Ilona Kulikovskikh; Tarzan Legovi\'c

arXiv:2102.00853·cs.LG·October 28, 2024

Painless step size adaptation for SGD

Ilona Kulikovskikh, Tarzan Legovi\'c

PDF

Open Access 1 Repo

TL;DR

This paper introduces the LIGHT function for adaptive step size in SGD, improving neural network convergence and generalization without overparameterization or stability guarantees.

Contribution

It proposes a novel LIGHT function that explicitly enhances both convergence and generalization in neural networks, simplifying step size adaptation.

Findings

01

Improves convergence and generalization without stability guarantees

02

Enables more reliable and explainable network architectures

03

Reduces need for overparameterization

Abstract

Convergence and generalization are two crucial aspects of performance in neural networks. When analyzed separately, these properties may lead to contradictory results. Optimizing a convergence rate yields fast training, but does not guarantee the best generalization error. To avoid the conflict, recent studies suggest adopting a moderately large step size for optimizers, but the added value on the performance remains unclear. We propose the LIGHT function with the four configurations which regulate explicitly an improvement in convergence and generalization on testing. This contribution allows to: 1) improve both convergence and generalization of neural networks with no need to guarantee their stability; 2) build more reliable and explainable network architectures with no need for overparameterization. We refer to it as "painless" step size adaptation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yukinoi/light-diagnostic-function
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Machine Learning and ELM