Stochastic Gradient Descent without Full Data Shuffle

Lijie Xu; Shuang Qiu; Binhang Yuan; Jiawei Jiang; Cedric Renggli,; Shaoduo Gan; Kaan Kara; Guoliang Li; Ji Liu; Wentao Wu; Jieping Ye; Ce Zhang

arXiv:2206.05830·cs.LG·June 14, 2022·1 cites

Stochastic Gradient Descent without Full Data Shuffle

Lijie Xu, Shuang Qiu, Binhang Yuan, Jiawei Jiang, Cedric Renggli,, Shaoduo Gan, Kaan Kara, Guoliang Li, Ji Liu, Wentao Wu, Jieping Ye, Ce Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CorgiPile, a hierarchical data shuffling strategy for SGD that reduces data shuffling overhead while maintaining convergence rates, significantly improving training speed in ML systems.

Contribution

The paper proposes CorgiPile, a novel data shuffling method that avoids full data shuffles, with theoretical analysis and practical integration into PyTorch and PostgreSQL.

Findings

01

CorgiPile achieves comparable convergence to full shuffling.

02

CorgiPile accelerates deep learning training on ImageNet by 1.5X.

03

CorgiPile improves in-DB ML training speed by up to 12.8X.

Abstract

Stochastic gradient descent (SGD) is the cornerstone of modern machine learning (ML) systems. Despite its computational efficiency, SGD requires random data access that is inherently inefficient when implemented in systems that rely on block-addressable secondary storage such as HDD and SSD, e.g., TensorFlow/PyTorch and in-DB ML systems over large files. To address this impedance mismatch, various data shuffling strategies have been proposed to balance the convergence rate of SGD (which favors randomness) and its I/O performance (which favors sequential access). In this paper, we first conduct a systematic empirical study on existing data shuffling strategies, which reveals that all existing strategies have room for improvement -- they all suffer in terms of I/O performance or convergence rate. With this in mind, we propose a simple but novel hierarchical data shuffling strategy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ds3lab/corgipile-postgresql
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Brain Tumor Detection and Classification

MethodsNon Maximum Suppression · 1x1 Convolution · Convolution · SSD · Stochastic Gradient Descent