Relating Checkpoint Update Probabilities to Momentum Parameters in Single-Loop Variance Reduction Methods

Hai Liu; Tiande Guo; Congying Han

arXiv:2601.02899·math.OC·February 26, 2026

Relating Checkpoint Update Probabilities to Momentum Parameters in Single-Loop Variance Reduction Methods

Hai Liu, Tiande Guo, Congying Han

PDF

Open Access

TL;DR

This paper introduces a unified single-loop variance reduction framework that relates checkpoint update probabilities to momentum parameters, enabling a flexible trade-off between acceleration and variance reduction, and achieves near-optimal complexity for large-scale convex optimization.

Contribution

It proposes a novel framework linking checkpoint update probabilities with momentum parameters, allowing adjustable acceleration and variance reduction, and derives new complexity bounds that improve upon existing methods.

Findings

01

Achieves near-optimal complexity $ ilde{O}(n + rac{ oot{2} }{ oot{2}\epsilon})$ for convex problems.

02

Unifies and redistributes complexity results of known methods within a single framework.

03

Demonstrates through experiments the efficiency and practical benefits of the proposed approach.

Abstract

We propose a single-loop variance-reduced acceleration framework, which relates checkpoint update probabilities to momentum parameters, for solving the composite general convex problem where the smooth part has the finite-sum structure. Under the proposed framework, the growth rate of the momentum parameter is further altered, creating a novel continuous trade-off between acceleration and variance reduction, controlled by the key parameter $α \in [0, 1]$ . A series of novel complexity is obtained, and some complexity of distinct known methods are rediscovered under the unified framework. When the mini-batch size is restricted due to the massive scale of the problem or the computational resource shortage, near-optimal complexity can still be achieved by choosing suitable $α$ for any prefixed target accuracy. Analysis shows that although the considered gradient oracle is exact,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Tensor decomposition and applications