Ordering for Non-Replacement SGD

Yuetong Xu; Baharan Mirzasoleiman

arXiv:2306.15848·cs.LG·June 29, 2023

Ordering for Non-Replacement SGD

Yuetong Xu, Baharan Mirzasoleiman

PDF

Open Access

TL;DR

This paper proposes optimal data ordering strategies for non-replacement stochastic gradient descent to improve convergence rates, validated through theoretical analysis and experiments on various datasets and neural networks.

Contribution

It introduces new orderings for non-replacement SGD based on theoretical bounds, enhancing convergence for convex and strongly convex functions.

Findings

01

Optimal orderings improve convergence rates.

02

Orderings work with mini-batch and neural networks.

03

Experimental results confirm theoretical predictions.

Abstract

One approach for reducing run time and improving efficiency of machine learning is to reduce the convergence rate of the optimization algorithm used. Shuffling is an algorithm technique that is widely used in machine learning, but it only started to gain attention theoretically in recent years. With different convergence rates developed for random shuffling and incremental gradient descent, we seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. Based on existing bounds of the distance between the optimal and current iterate, we derive an upper bound that is dependent on the gradients at the beginning of the epoch. Through analysis of the bound, we are able to develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. We further test and verify our results through experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning