Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity
Sally Dong, Haotian Jiang, Yin Tat Lee, Swati Padmanabhan, and, Guanghao Ye

TL;DR
None
Contribution
None
Abstract
Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in R^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each is a convex, Lipschitz function supported on a subset of coordinates of . One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the 's, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to -accuracy in gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Complexity and Algorithms in Graphs
