Speeding up Stochastic Proximal Optimization in the High Hessian Dissimilarity Setting
Elnur Gasanov, Peter Richt\'arik

TL;DR
This paper improves the convergence rate of stochastic proximal methods in high Hessian dissimilarity settings, reducing communication complexity without relying on smoothness assumptions, benefiting large-scale machine learning optimization.
Contribution
It provides a theoretical analysis that enhances convergence rate bounds for L-SVRP in high Hessian dissimilarity scenarios, without requiring smoothness.
Findings
Improved convergence rate in high Hessian dissimilarity settings.
Reduced communication complexity compared to standard SGD.
Analysis does not depend on smoothness assumptions.
Abstract
Stochastic proximal point methods have recently garnered renewed attention within the optimization community, primarily due to their desirable theoretical properties. Notably, these methods exhibit a convergence rate that is independent of the Lipschitz smoothness constants of the loss function, a feature often missing in the loss functions of modern ML applications. In this paper, we revisit the analysis of the Loopless Stochastic Variance Reduced Proximal Point Method (L-SVRP). Building on existing work, we establish a theoretical improvement in the convergence rate in scenarios characterized by high Hessian dissimilarity among the functions. Our concise analysis, which does not require smoothness assumptions, demonstrates a significant improvement in communication complexity compared to standard stochastic gradient descent.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications
