A note on $R$-linear convergence of nonmonotone gradient methods
Xinrui Li, Yakui Huang

TL;DR
This paper introduces a property that improves the theoretical convergence rate analysis of nonmonotone gradient methods, aligning it more closely with their practical performance, especially for quadratic optimization.
Contribution
It establishes a new convergence property that guarantees $R$-linear convergence for a broad class of gradient methods, improving existing theoretical rates.
Findings
Gradient methods with the new property converge $R$-linearly at rate $1-rac{ ext{smallest eigenvalue}}{ ext{upper bound of inverse stepsize}}$.
Existing nonmonotone methods' convergence rates can be improved to $1-1/\kappa$, where $\kappa$ is the condition number.
The results bridge the gap between theoretical convergence rates and practical performance of nonmonotone gradient methods.
Abstract
Nonmonotone gradient methods generally perform better than their monotone counterparts especially on unconstrained quadratic optimization. However, the known convergence rate of the monotone method is often much better than its nonmonotone variant. With the aim of shrinking the gap between theory and practice of nonmonotone gradient methods, we introduce a property for convergence analysis of a large collection of gradient methods. We prove that any gradient method using stepsizes satisfying the property will converge -linearly at a rate of , where is the smallest eigenvalue of Hessian matrix and is the upper bound of the inverse stepsize. Our results indicate that the existing convergence rates of many nonmonotone methods can be improved to with being the associated condition number.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques
