A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

Feng Zhu; Robert W. Heath Jr.; and Aritra Mitra

arXiv:2602.05304·cs.LG·May 22, 2026

A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms

Feng Zhu, Robert W. Heath Jr., and Aritra Mitra

PDF

TL;DR

This paper presents a unified convergence analysis for SAG, SAGA, and IAG algorithms in large-scale machine learning, simplifying proofs and improving bounds.

Contribution

It develops a single, modular convergence analysis applicable to all three algorithms, introducing new bounds and extending to non-convex and Markov sampling scenarios.

Findings

01

First high-probability bounds for SAG and SAGA

02

Improved convergence rates for IAG

03

Unified analysis simplifies understanding of these algorithms

Abstract

Stochastic variance-reduced algorithms such as Stochastic Average Gradient (SAG) and SAGA, and their deterministic counterparts like the Incremental Aggregated Gradient (IAG) method, have been extensively studied in large-scale machine learning. Despite their popularity, existing analyses for these algorithms are disparate, relying on different proof techniques tailored to each method. Furthermore, the original proof of SAG is known to be notoriously involved, requiring computer-aided analysis. Focusing on finite-sum optimization with smooth and strongly convex objective functions, our main contribution is to develop a single unified convergence analysis that applies to all three algorithms: SAG, SAGA, and IAG. Our analysis features two key steps: (i) establishing a bound on delays due to stochastic sub-sampling using simple concentration tools, and (ii) carefully designing a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques