A straightforward line search approach on the expected empirical loss   for stochastic deep learning problems

Maximus Mutschler; Andreas Zell

arXiv:2010.00921·cs.LG·October 5, 2020

A straightforward line search approach on the expected empirical loss for stochastic deep learning problems

Maximus Mutschler, Andreas Zell

PDF

Open Access

TL;DR

This paper introduces a simple line search method for stochastic deep learning that approximates the expected empirical loss using function fitting, enabling automatic step size selection without hyperparameter tuning.

Contribution

It proposes a novel, computationally efficient line search technique based on function fitting to noisy loss measurements, improving optimization robustness in deep learning.

Findings

01

Performs well across different datasets and architectures

02

Eliminates the need for hyperparameter tuning

03

Offers a robust alternative to traditional step size selection

Abstract

A fundamental challenge in deep learning is that the optimal step sizes for update steps of stochastic gradient descent are unknown. In traditional optimization, line searches are used to determine good step sizes, however, in deep learning, it is too costly to search for good step sizes on the expected empirical loss due to noisy losses. This empirical work shows that it is possible to approximate the expected empirical loss on vertical cross sections for common deep learning tasks considerably cheaply. This is achieved by applying traditional one-dimensional function fitting to measured noisy losses of such cross sections. The step to a minimum of the resulting approximation is then used as step size for the optimization. This approach leads to a robust and straightforward optimization method which performs well across datasets and architectures without the need of hyperparameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning