Stochastic Steffensen method
Minda Zhao, Zehua Lai, and Lek-Heng Lim

TL;DR
This paper introduces a stochastic Steffensen method that achieves super-quadratic convergence without second derivatives, suitable for large-scale optimization, and demonstrates its effectiveness through extensive experiments.
Contribution
It proposes a novel stochastic optimization method based on Steffensen's approach, achieving high convergence orders without hyperparameter tuning and generalizing the randomized Kaczmarz method.
Findings
Outperforms existing first-order methods in experiments.
Achieves convergence order of approximately 2.414 with optimal step size.
Reduces to the randomized Kaczmarz method for quadratic objectives.
Abstract
Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to . While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimization setting and requires no hyperparameter tuning aside from batch size. Extensive experiments show that they compare favorably with several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
MethodsStochastic Gradient Descent
