A Gradient Complexity Analysis for Minimizing the Sum of Strongly Convex Functions with Varying Condition Numbers
Nuozhou Wang, Shuzhong Zhang

TL;DR
This paper analyzes the gradient complexity of stochastic gradient descent (SGD) for minimizing sums of strongly convex functions with varying condition numbers, proposing optimal algorithms and bounds.
Contribution
It introduces an SGD method tailored for sums of strongly convex functions with different condition numbers and establishes its optimality in gradient computations.
Findings
Proposed an SGD algorithm optimal up to a logarithmic factor for functions with varying condition numbers.
Derived lower and upper bounds for gradient computation complexity in constrained block optimization.
Showed that solving the Fenchel dual can be more efficient than direct methods like ADMM.
Abstract
A popular approach to minimize a finite-sum of convex functions is stochastic gradient descent (SGD) and its variants. Fundamental research questions associated with SGD include: (i) To find a lower bound on the number of times that the gradient oracle of each individual function must be assessed in order to find an -minimizer of the overall objective; (ii) To design algorithms which guarantee to find an -minimizer of the overall objective in expectation at no more than a certain number of times (in terms of ) that the gradient oracle of each functions needs to be assessed (i.e., upper bound). If these two bounds are at the same order of magnitude, then the algorithms may be called optimal. Most existing results along this line of research typically assume that the functions in the objective share the same condition number. In this paper, the first model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
