Three Operator Splitting with Subgradients, Stochastic Gradients, and   Adaptive Learning Rates

Alp Yurtsever; Alex Gu; Suvrit Sra

arXiv:2110.03274·math.OC·February 21, 2022·NeurIPS

Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates

Alp Yurtsever, Alex Gu, Suvrit Sra

PDF

Open Access 1 Video

TL;DR

This paper extends Three Operator Splitting (TOS) to handle stochastic gradients, subgradients, and adaptive learning rates, enabling more practical and efficient optimization in machine learning applications with complex or unknown structures.

Contribution

The paper introduces three extensions of TOS: handling subgradients, stochastic gradients, and adaptive step-sizes, with convergence guarantees and practical advantages.

Findings

01

Extensions ensure $\\mathcal{O}(1/\sqrt{t})$ convergence rate.

02

AdapTOS adapts to unknown smoothness, achieving universal convergence.

03

Empirical results show improved performance over competing methods.

Abstract

Three Operator Splitting (TOS) (Davis & Yin, 2017) can minimize the sum of multiple convex functions effectively when an efficient gradient oracle or proximal operator is available for each term. This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handle complex penalty functions may be more efficient and realistic. Motivated by these concerns, we analyze three potentially valuable extensions of TOS. The first two permit using subgradients and stochastic gradients, and are shown to ensure a $O (1/ t)$ convergence rate. The third extension AdapTOS endows TOS with adaptive step-sizes. For the important setting of optimizing a convex loss over the intersection of convex sets AdapTOS attains universal convergence rates, i.e., the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms