Biased Local SGD for Efficient Deep Learning on Heterogeneous Systems
Jihyun Lim, Junhyuk Jo, Chanhyeok Ko, Young Min Go, Jimin Hwa, Sunwoo Lee

TL;DR
This paper introduces a biased local SGD method that improves training efficiency on heterogeneous systems by harmonizing different device speeds, achieving significant speedups without sacrificing accuracy.
Contribution
It proposes a novel biased sampling and aggregation approach in local SGD to effectively utilize diverse hardware resources in deep learning training.
Findings
Up to 32x faster training on CIFAR-10 with minimal accuracy loss
Significant acceleration of local SGD on heterogeneous systems
Practical insights for leveraging diverse compute resources
Abstract
Most parallel neural network training methods assume homogeneous computing resources. For example, synchronous data-parallel SGD suffers from significant synchronization overhead under heterogeneous workloads, often forcing practitioners to rely only on the fastest devices (e.g., GPUs). In this work, we study local SGD for efficient parallel training on heterogeneous systems. We show that intentionally introducing bias in data sampling and model aggregation can effectively harmonize slower CPUs with faster GPUs. Our extensive empirical results demonstrate that a carefully controlled bias significantly accelerates local SGD while achieving comparable or even higher accuracy than synchronous SGD under the same epoch budget. For instance, our method trains ResNet20 on CIFAR-10 with 2 CPUs and 8 GPUs up to 32x faster than synchronous SGD, with nearly identical accuracy. These results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
