Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models
Leonardo Galli, Holger Rauhut, Mark Schmidt

TL;DR
This paper introduces nonmonotone line search methods for stochastic optimization that allow larger step sizes without sacrificing convergence speed, improving training efficiency and generalization in over-parameterized models.
Contribution
It proposes the PoNoS method combining nonmonotone line search with Polyak step size and introduces a resetting technique to reduce backtracks, enhancing optimization performance.
Findings
Nonmonotone methods match the convergence rates of monotone ones.
Experiments show faster convergence and better generalization.
Reduced backtracks lead to lower computational costs.
Abstract
Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step sizes. Despite the lack of a monotonic decrease, we prove the same fast rates of convergence as in the monotone case. Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained by combining a nonmonotone line search with a Polyak initial step size. Furthermore, we develop a new resetting technique that in the majority…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMetaheuristic Optimization Algorithms Research · Advanced Image and Video Retrieval Techniques · Diffusion and Search Dynamics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adam
