A Unified Theory of Stochastic Proximal Point Methods without Smoothness

Peter Richt\'arik; Abdurakhmon Sadiev; Yury Demidovich

arXiv:2405.15941·math.OC·May 28, 2024

A Unified Theory of Stochastic Proximal Point Methods without Smoothness

Peter Richt\'arik, Abdurakhmon Sadiev, Yury Demidovich

PDF

Open Access 3 Reviews

TL;DR

This paper provides a unified theoretical framework for stochastic proximal point methods, demonstrating linear convergence without requiring smoothness, and introduces new variants with empirical validation.

Contribution

It offers a general convergence theorem for SPPM under broad assumptions, including non-smooth settings, and develops three novel SPPM variants.

Findings

01

Linear convergence established without smoothness assumptions

02

Unified analysis encompasses variance reduction and arbitrary sampling

03

New SPPM variants show promising empirical performance

Abstract

This paper presents a comprehensive analysis of a broad range of variations of the stochastic proximal point method (SPPM). Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning, a trait not shared by the dominant stochastic gradient descent (SGD) algorithm. A framework of assumptions that we introduce encompasses methods employing techniques such as variance reduction and arbitrary sampling. A cornerstone of our general theoretical approach is a parametric assumption on the iterates, correction and control vectors. We establish a single theorem that ensures linear convergence under this assumption and the $μ$ -strong convexity of the loss function, and without the need to invoke smoothness. This integral theorem reinstates best known complexity and convergence guarantees for several existing methods which…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 5

Strengths

The main technical contribution of the paper is a generalization of an existing framework from functions with a Lipschitz continuous gradient to the case of differentiable functions. A non negligible contribution in my opinion is the very clear presentation with several interesting remarks. The assumptions and the statements are clear and the proofs technically sound.

Weaknesses

The main novelty of the paper is the analysis of a unifying algorithm that allows to deal with variance reduced stochastic proximal point methods for an objective function which is only differentiable and strongly convex. Though the main proofs are partially different from the ones used in the related literature, the idea is not new and it is a generalization of the approach proposed in the papers: 1) E. Gorbunov, F. Hanzely, and P. Richtarik. A unified theory of sgd: Variance reduction, s

Reviewer 02Rating 5Confidence 4

Strengths

The presentation is good. The authors provide solid theoretical analysis to support the proposed framework.

Weaknesses

The motivation of this paper is unclear. The assumption without smoothness used in this paper looks not popular.

Reviewer 03Rating 5Confidence 3

Strengths

**S1:** This paper is generally well-written and easy to follow. It is well-grounded in theoretical analysis, establishing convergence guarantees without relying on smoothness assumptions. **S2:** It unifies multiple SPPM variants under a single theoretical framework, making it easier to understand the relationships and convergence behavior across methods. **S3:** The development of new SPPM variants, such as SPPM with Nonuniform Sampling and SPPM with Arbitrary Sampling, enriches the field by

Weaknesses

**W1.** This paper appears more like a review article, which may not align well with the scope of ICLR. The authors propose seven algorithms for solving differentiable convex problems. However, these algorithms may lack sufficient novelty or clear performance advantages. Readers might still be uncertain about which algorithm is the best choice. **W2.** The framework assumes strong convexity, limiting the applicability of the results to problems that are not strongly convex or to non-convex sett

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Risk and Portfolio Optimization · Optimization and Variational Analysis