Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares
Liangzu Peng, Wotao Yin

TL;DR
This paper derives optimal stepsizes for block gradient descent in least-squares problems, demonstrating that proper tuning can double convergence speed compared to gradient descent with momentum, without additional momentum.
Contribution
It provides the first theoretical justification for stepsize tuning in BGD that accelerates convergence, especially under orthogonality assumptions, surpassing previous rate bounds.
Findings
Optimal stepsizes for BGD are derived in closed form.
Under orthogonality, BGD converges twice as fast as GD with momentum.
Applying these stepsizes improves convergence rates in subspace projection problems.
Abstract
Block coordinate descent is a powerful algorithmic template suitable for big data optimization. This template admits a lot of variants including block gradient descent (BGD), which performs gradient descent on a selected block of variables, while keeping other variables fixed. For a very long time, the stepsize for each block has tacitly been set to one divided by the block-wise Lipschitz smoothness constant, imitating the vanilla stepsize rule for gradient descent (GD). However, such a choice for BGD has not yet been able to theoretically justify its empirical superiority over GD, as existing convergence rates for BGD have worse constants than GD in the deterministic cases. To discover such theoretical justification, we set up a simple environment where we consider BGD applied to least-squares with two blocks of variables. Assuming the data matrix corresponding to each block is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Statistical and numerical algorithms · Field-Flow Fractionation Techniques
