GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Model
Wenbo Su, Yuanxing Zhang, Yufeng Cai, Kaixu Ren, Pengjie Wang, Huimin, Yi, Yue Song, Jing Chen, Hongbo Deng, Jian Xu, Lin Qu, Bo zheng

TL;DR
This paper introduces GBA, a tuning-free method for dynamically switching between synchronous and asynchronous training modes in recommendation models, improving efficiency and robustness without hyper-parameter tuning.
Contribution
GBA enables automatic mode switching in distributed training for recommendation models without hyper-parameter tuning, handling gradient staleness and distribution issues.
Findings
GBA achieves up to 0.2% AUC improvement over state-of-the-art asynchronous methods.
GBA speeds up training by at least 2.4x under limited hardware resources.
GBA maintains comparable convergence properties to synchronous training.
Abstract
High-concurrency asynchronous training upon parameter server (PS) architecture and high-performance synchronous training upon all-reduce (AR) architecture are the most commonly deployed distributed training modes for recommendation models. Although synchronous AR training is designed to have higher training efficiency, asynchronous PS training would be a better choice for training speed when there are stragglers (slow workers) in the shared cluster, especially under limited computing resources. An ideal way to take full advantage of these two training modes is to switch between them upon the cluster status. However, switching training modes often requires tuning hyper-parameters, which is extremely time- and resource-consuming. We find two obstacles to a tuning-free approach: the different distribution of the gradient values and the stale gradients from the stragglers. This paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
