A Unified Theory of Stochastic Proximal Point Methods without Smoothness
Peter Richt\'arik, Abdurakhmon Sadiev, Yury Demidovich

TL;DR
This paper provides a unified theoretical framework for stochastic proximal point methods, demonstrating linear convergence without requiring smoothness, and introduces new variants with empirical validation.
Contribution
It offers a general convergence theorem for SPPM under broad assumptions, including non-smooth settings, and develops three novel SPPM variants.
Findings
Linear convergence established without smoothness assumptions
Unified analysis encompasses variance reduction and arbitrary sampling
New SPPM variants show promising empirical performance
Abstract
This paper presents a comprehensive analysis of a broad range of variations of the stochastic proximal point method (SPPM). Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning, a trait not shared by the dominant stochastic gradient descent (SGD) algorithm. A framework of assumptions that we introduce encompasses methods employing techniques such as variance reduction and arbitrary sampling. A cornerstone of our general theoretical approach is a parametric assumption on the iterates, correction and control vectors. We establish a single theorem that ensures linear convergence under this assumption and the -strong convexity of the loss function, and without the need to invoke smoothness. This integral theorem reinstates best known complexity and convergence guarantees for several existing methods which…
Peer Reviews
Decision·Submitted to ICLR 2025
The main technical contribution of the paper is a generalization of an existing framework from functions with a Lipschitz continuous gradient to the case of differentiable functions. A non negligible contribution in my opinion is the very clear presentation with several interesting remarks. The assumptions and the statements are clear and the proofs technically sound.
The main novelty of the paper is the analysis of a unifying algorithm that allows to deal with variance reduced stochastic proximal point methods for an objective function which is only differentiable and strongly convex. Though the main proofs are partially different from the ones used in the related literature, the idea is not new and it is a generalization of the approach proposed in the papers: 1) E. Gorbunov, F. Hanzely, and P. Richtarik. A unified theory of sgd: Variance reduction, s
The presentation is good. The authors provide solid theoretical analysis to support the proposed framework.
The motivation of this paper is unclear. The assumption without smoothness used in this paper looks not popular.
**S1:** This paper is generally well-written and easy to follow. It is well-grounded in theoretical analysis, establishing convergence guarantees without relying on smoothness assumptions. **S2:** It unifies multiple SPPM variants under a single theoretical framework, making it easier to understand the relationships and convergence behavior across methods. **S3:** The development of new SPPM variants, such as SPPM with Nonuniform Sampling and SPPM with Arbitrary Sampling, enriches the field by
**W1.** This paper appears more like a review article, which may not align well with the scope of ICLR. The authors propose seven algorithms for solving differentiable convex problems. However, these algorithms may lack sufficient novelty or clear performance advantages. Readers might still be uncertain about which algorithm is the best choice. **W2.** The framework assumes strong convexity, limiting the applicability of the results to problems that are not strongly convex or to non-convex sett
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optimization Algorithms Research · Risk and Portfolio Optimization · Optimization and Variational Analysis
