Training Neural Networks for and by Interpolation
Leonard Berrada, Andrew Zisserman, M. Pawan Kumar

TL;DR
This paper introduces ALI-G, an adaptive optimization algorithm that leverages neural network interpolation properties to automatically set learning rates, achieving state-of-the-art results with minimal tuning across various architectures and datasets.
Contribution
The paper proposes ALI-G, a novel interpolation-based adaptive optimizer that simplifies tuning and matches or surpasses existing methods' performance in deep learning tasks.
Findings
ALI-G achieves state-of-the-art results among adaptive methods.
It performs comparably to SGD without requiring learning-rate decay schedules.
ALI-G is simple to implement and versatile across architectures and datasets.
Abstract
In modern supervised learning, many deep neural networks are able to interpolate the data: the empirical loss can be driven to near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning, which we term Adaptive Learning-rates for Interpolation with Gradients (ALI-G). ALI-G retains the two main advantages of Stochastic Gradient Descent (SGD), which are (i) a low computational cost per iteration and (ii) good generalization performance in practice. At each iteration, ALI-G exploits the interpolation property to compute an adaptive learning-rate in closed form. In addition, ALI-G clips the learning-rate to a maximal value, which we prove to be helpful for non-convex problems. Crucially, in contrast to the learning-rate of SGD, the maximal learning-rate of ALI-G does not require a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks
MethodsAdam · Stochastic Gradient Descent
