Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Kanata Oowada; Hideaki Iiduka

arXiv:2501.18164·cs.LG·October 14, 2025

Faster Convergence of Riemannian Stochastic Gradient Descent with Increasing Batch Size

Kanata Oowada, Hideaki Iiduka

PDF

Open Access

TL;DR

This paper demonstrates that increasing batch sizes in Riemannian stochastic gradient descent accelerates convergence and reduces computational complexity, outperforming constant batch size strategies under various learning rate schedules.

Contribution

It provides a theoretical analysis showing faster convergence with increasing batch sizes and explores their impact on computational efficiency through PCA and matrix completion.

Findings

01

Increasing batch size improves convergence rate to O(T^{-1})

02

An increasing batch size reduces stochastic first-order oracle complexity

03

Combines benefits of small and large constant batch sizes

Abstract

We theoretically analyzed the convergence behavior of Riemannian stochastic gradient descent (RSGD) and found that using an increasing batch size leads to faster convergence than using a constant batch size, not only with a constant learning rate but also with a decaying learning rate, such as cosine annealing decay and polynomial decay. The convergence rate improves from $O (T^{- 1} + C)$ with a constant batch size to $O (T^{- 1})$ with an increasing batch size, where $T$ denotes the total number of iterations and $C$ is a constant. Using principal component analysis and low-rank matrix completion, we investigated, both theoretically and numerically, how an increasing batch size affects computational time as quantified by stochastic first-order oracle (SFO) complexity. An increasing batch size was found to reduce the SFO complexity of RSGD. Furthermore, an increasing batch size was found to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques

MethodsCosine Annealing