Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient   Oracle Complexity

Sally Dong; Haotian Jiang; Yin Tat Lee; Swati Padmanabhan; and; Guanghao Ye

arXiv:2208.03811·math.OC·August 9, 2022·1 cites

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Sally Dong, Haotian Jiang, Yin Tat Lee, Swati Padmanabhan, and, Guanghao Ye

PDF

Open Access 1 Video

TL;DR

None

Contribution

None

Abstract

Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in R^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each $f_{i}$ is a convex, Lipschitz function supported on a subset of $d_{i}$ coordinates of $θ$ . One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_{i}$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_{i}$ 's, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $ϵ$ -accuracy in $O (\sum_{i = 1}^{n} d_{i} lo g (1/ ϵ))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O (n d lo g (1/ ϵ))$ gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Complexity and Algorithms in Graphs