Comment on Stochastic Polyak Step-Size: Performance of ALI-G
Leonard Berrada, Andrew Zisserman, M. Pawan Kumar

TL;DR
This note highlights that with proper hyper-parameter tuning and momentum, the ALI-G algorithm significantly improves its performance in training neural networks, maintaining competitiveness in interpolating models.
Contribution
It demonstrates that well-tuned hyper-parameters and momentum can substantially enhance ALI-G's performance in training neural networks.
Findings
ALI-G achieves 93.5% accuracy on CIFAR-10 with tuning.
ALI-G reaches 76% accuracy on CIFAR-100 with tuning.
Proper hyper-parameter tuning improves ALI-G's performance significantly.
Abstract
This is a short note on the performance of the ALI-G algorithm (Berrada et al., 2020) as reported in (Loizou et al., 2021). ALI-G (Berrada et al., 2020) and SPS (Loizou et al., 2021) are both adaptations of the Polyak step-size to optimize machine learning models that can interpolate the training data. The main algorithmic differences are that (1) SPS employs a multiplicative constant in the denominator of the learning-rate while ALI-G uses an additive constant, and (2) SPS uses an iteration-dependent maximal learning-rate while ALI-G uses a constant one. There are also differences in the analysis provided by the two works, with less restrictive assumptions proposed in (Loizou et al., 2021). In their experiments, (Loizou et al., 2021) did not use momentum for ALI-G (which is a standard part of the algorithm) or standard hyper-parameter tuning (for e.g. learning-rate and regularization).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Model Reduction and Neural Networks
