Multilayer Lookahead: a Nested Version of Lookahead

Denys Pushkin; Luis Barba

arXiv:2110.14254·cs.LG·October 28, 2021

Multilayer Lookahead: a Nested Version of Lookahead

Denys Pushkin, Luis Barba

PDF

Open Access

TL;DR

This paper introduces Multilayer Lookahead, an extension of the Lookahead optimizer, demonstrating its convergence and improved generalization over SGD, with empirical results showing superior performance on image classification and GAN training tasks.

Contribution

It proposes Multilayer Lookahead, a recursive extension of Lookahead, and proves its convergence, highlighting its enhanced regularization and empirical performance.

Findings

01

Multilayer Lookahead converges to stationary points with O(1/√T) rate.

02

Multilayer Lookahead outperforms Lookahead and SGD in experiments.

03

The method improves generalization by amplifying implicit regularization.

Abstract

In recent years, SGD and its variants have become the standard tool to train Deep Neural Networks. In this paper, we focus on the recently proposed variant Lookahead, which improves upon SGD in a wide range of applications. Following this success, we study an extension of this algorithm, the \emph{Multilayer Lookahead} optimizer, which recursively wraps Lookahead around itself. We prove the convergence of Multilayer Lookahead with two layers to a stationary point of smooth non-convex functions with $O (\frac{1}{T})$ rate. We also justify the improved generalization of both Lookahead over SGD, and of Multilayer Lookahead over Lookahead, by showing how they amplify the implicit regularization effect of SGD. We empirically verify our results and show that Multilayer Lookahead outperforms Lookahead on CIFAR-10 and CIFAR-100 classification tasks, and on GANs training on the MNIST…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsStochastic Gradient Descent · Lookahead