Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight   Averaging for Better Generalization

G\'abor Melis

arXiv:2209.12581·stat.ML·April 18, 2023

Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization

G\'abor Melis

PDF

Open Access

TL;DR

This paper introduces Two-Tailed Averaging, an adaptive method that improves generalization in stochastic optimization by dynamically approximating the optimal tail without hyperparameters, requiring minimal additional resources.

Contribution

The paper proposes an anytime, hyperparameter-free Tail Averaging variant that adaptively approximates the optimal tail, enhancing generalization in stochastic optimization.

Findings

01

Achieves better generalization than standard averaging methods.

02

Requires only two running averages and periodic loss evaluation.

03

No hyperparameters needed for the averaging process.

Abstract

Tail Averaging improves on Polyak averaging's non-asymptotic behaviour by excluding a number of leading iterates of stochastic optimization from its calculations. In practice, with a finite number of optimization steps and a learning rate that cannot be annealed to zero, Tail Averaging can get much closer to a local minimum point of the training loss than either the individual iterates or the Polyak average. However, the number of leading iterates to ignore is an important hyperparameter, and starting averaging too early or too late leads to inefficient use of resources or suboptimal solutions. Our work focusses on improving generalization, which makes setting this hyperparameter even more difficult, especially in the presence of other hyperparameters and overfitting. Furthermore, before averaging starts, the loss is only weakly informative of the final performance, which makes early…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetaheuristic Optimization Algorithms Research · Neural Networks and Applications · Machine Learning and Algorithms

MethodsEarly Stopping