GOALS: Gradient-Only Approximations for Line Searches Towards Robust and Consistent Training of Deep Neural Networks
Younghwan Chae, Daniel N. Wilke, Dominic Kafka

TL;DR
This paper introduces GOALS, a gradient-only line search method for deep neural network training that effectively handles the discontinuities caused by dynamic mini-batch sampling, improving robustness and convergence.
Contribution
The study extends the gradient-only surrogate (GOS) to dynamic MBSS loss functions, proposing GOALS with strong convergence guarantees and demonstrating its effectiveness across various optimizers and models.
Findings
GOALS outperforms existing learning rate methods in robustness.
Training with GOALS reduces model errors in multimodal loss landscapes.
GOALS provides a reliable line search approach for dynamic mini-batch sampling.
Abstract
Mini-batch sub-sampling (MBSS) is favored in deep neural network training to reduce the computational cost. Still, it introduces an inherent sampling error, making the selection of appropriate learning rates challenging. The sampling errors can manifest either as a bias or variances in a line search. Dynamic MBSS re-samples a mini-batch at every function evaluation. Hence, dynamic MBSS results in point-wise discontinuous loss functions with smaller bias but larger variance than static sampled loss functions. However, dynamic MBSS has the advantage of having larger data throughput during training but requires the complexity regarding discontinuities to be resolved. This study extends the gradient-only surrogate (GOS), a line search method using quadratic approximation models built with only directional derivative information, for dynamic MBSS loss functions. We propose a gradient-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Algorithms
MethodsRMSProp · Stochastic Gradient Descent · Adam
