The Optimality of (Accelerated) SGD for High-Dimensional Quadratic Optimization
Haihan Zhang, Yuanshi Liu, Qianwen Chen, Cong Fang

TL;DR
This paper analyzes the conditions under which stochastic gradient descent (SGD) and its accelerated variants are optimal for high-dimensional quadratic problems, providing convergence bounds and insights into their effectiveness based on problem complexity.
Contribution
The paper establishes convergence bounds for momentum-accelerated SGD and characterizes problem classes where SGD and ASGD are min-max optimal, revealing new insights into their learning biases.
Findings
SGD is effective for dense features with norm constraints.
SGD performs well on easy problems without saturation.
Momentum accelerates convergence in harder problems.
Abstract
Stochastic gradient descent (SGD) is a widely used algorithm in machine learning, particularly for neural network training. Recent studies on SGD for canonical quadratic optimization or linear regression show it attains well generalization under suitable high-dimensional settings. However, a fundamental question -- for what kinds of high-dimensional learning problems SGD and its accelerated variants can achieve optimality has yet to be well studied. This paper investigates SGD with two essential components in practice: exponentially decaying step size schedule and momentum. We establish the convergence upper bound for momentum accelerated SGD (ASGD) and propose concrete classes of learning problems under which SGD or ASGD achieves min-max optimal convergence rates. The characterization of the target function is based on standard power-law decays in (functional) linear regression. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Advanced Numerical Analysis Techniques
MethodsLinear Regression · Stochastic Gradient Descent
