GOALS: Gradient-Only Approximations for Line Searches Towards Robust and   Consistent Training of Deep Neural Networks

Younghwan Chae; Daniel N. Wilke; Dominic Kafka

arXiv:2105.10915·stat.ML·May 25, 2021·1 cites

GOALS: Gradient-Only Approximations for Line Searches Towards Robust and Consistent Training of Deep Neural Networks

Younghwan Chae, Daniel N. Wilke, Dominic Kafka

PDF

Open Access

TL;DR

This paper introduces GOALS, a gradient-only line search method for deep neural network training that effectively handles the discontinuities caused by dynamic mini-batch sampling, improving robustness and convergence.

Contribution

The study extends the gradient-only surrogate (GOS) to dynamic MBSS loss functions, proposing GOALS with strong convergence guarantees and demonstrating its effectiveness across various optimizers and models.

Findings

01

GOALS outperforms existing learning rate methods in robustness.

02

Training with GOALS reduces model errors in multimodal loss landscapes.

03

GOALS provides a reliable line search approach for dynamic mini-batch sampling.

Abstract

Mini-batch sub-sampling (MBSS) is favored in deep neural network training to reduce the computational cost. Still, it introduces an inherent sampling error, making the selection of appropriate learning rates challenging. The sampling errors can manifest either as a bias or variances in a line search. Dynamic MBSS re-samples a mini-batch at every function evaluation. Hence, dynamic MBSS results in point-wise discontinuous loss functions with smaller bias but larger variance than static sampled loss functions. However, dynamic MBSS has the advantage of having larger data throughput during training but requires the complexity regarding discontinuities to be resolved. This study extends the gradient-only surrogate (GOS), a line search method using quadratic approximation models built with only directional derivative information, for dynamic MBSS loss functions. We propose a gradient-only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Algorithms

MethodsRMSProp · Stochastic Gradient Descent · Adam