The Power of Factorial Powers: New Parameter settings for (Stochastic) Optimization
Aaron Defazio, Robert M. Gower

TL;DR
This paper introduces factorial powers as a versatile tool for setting constants in convergence proofs, improving and simplifying convergence rate analyses for various optimization algorithms.
Contribution
It proposes using factorial powers to define constants in convergence proofs, enhancing the analysis of momentum, accelerated gradient, and SVRG methods.
Findings
Factorial powers have useful mathematical properties.
Applying factorial powers simplifies convergence proofs.
Improves convergence rate bounds for several optimization algorithms.
Abstract
The convergence rates for convex and non-convex optimization methods depend on the choice of a host of constants, including step sizes, Lyapunov function constants and momentum constants. In this work we propose the use of factorial powers as a flexible tool for defining constants that appear in convergence proofs. We list a number of remarkable properties that these sequences enjoy, and show how they can be applied to convergence proofs to simplify or improve the convergence rates of the momentum method, accelerated gradient and the stochastic variance reduced method (SVRG).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
