Using a one dimensional parabolic model of the full-batch loss to   estimate learning rates during training

Maximus Mutschler; Kevin Laube; Andreas Zell

arXiv:2108.13880·cs.LG·February 22, 2022

Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training

Maximus Mutschler, Kevin Laube, Andreas Zell

PDF

Open Access 1 Repo

TL;DR

This paper proposes a novel line search method for deep learning that uses a one-dimensional parabolic model of the full-batch loss to adaptively estimate learning rates during training, addressing the challenge of automatic step size selection.

Contribution

It introduces a parabolic approximation-based line search method that efficiently estimates learning rates using mini-batches, outperforming existing approaches in various deep learning scenarios.

Findings

01

Method matches tuned SGD with Momentum in performance.

02

Often outperforms other line search methods across models and datasets.

03

First to sample larger batch sizes over multiple inferences in deep learning line search.

Abstract

A fundamental challenge in Deep Learning is to find optimal step sizes for stochastic gradient descent automatically. In traditional optimization, line searches are a commonly used method to determine step sizes. One problem in Deep Learning is that finding appropriate step sizes on the full-batch loss is unfeasibly expensive. Therefore, classical line search approaches, designed for losses without inherent noise, are usually not applicable. Recent empirical findings suggest, inter alia, that the full-batch loss behaves locally parabolically in the direction of noisy update step directions. Furthermore, the trend of the optimal update step size changes slowly. By exploiting these and more findings, this work introduces a line-search method that approximates the full-batch loss with a parabola estimated over several mini-batches. Learning rates are derived from such parabolas during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cogsys-tuebingen/labpal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications

MethodsSGD with Momentum · Stochastic Gradient Descent