GBA: A Tuning-free Approach to Switch between Synchronous and   Asynchronous Training for Recommendation Model

Wenbo Su; Yuanxing Zhang; Yufeng Cai; Kaixu Ren; Pengjie Wang; Huimin; Yi; Yue Song; Jing Chen; Hongbo Deng; Jian Xu; Lin Qu; Bo zheng

arXiv:2205.11048·cs.LG·October 11, 2022

GBA: A Tuning-free Approach to Switch between Synchronous and Asynchronous Training for Recommendation Model

Wenbo Su, Yuanxing Zhang, Yufeng Cai, Kaixu Ren, Pengjie Wang, Huimin, Yi, Yue Song, Jing Chen, Hongbo Deng, Jian Xu, Lin Qu, Bo zheng

PDF

Open Access

TL;DR

This paper introduces GBA, a tuning-free method for dynamically switching between synchronous and asynchronous training modes in recommendation models, improving efficiency and robustness without hyper-parameter tuning.

Contribution

GBA enables automatic mode switching in distributed training for recommendation models without hyper-parameter tuning, handling gradient staleness and distribution issues.

Findings

01

GBA achieves up to 0.2% AUC improvement over state-of-the-art asynchronous methods.

02

GBA speeds up training by at least 2.4x under limited hardware resources.

03

GBA maintains comparable convergence properties to synchronous training.

Abstract

High-concurrency asynchronous training upon parameter server (PS) architecture and high-performance synchronous training upon all-reduce (AR) architecture are the most commonly deployed distributed training modes for recommendation models. Although synchronous AR training is designed to have higher training efficiency, asynchronous PS training would be a better choice for training speed when there are stragglers (slow workers) in the shared cluster, especially under limited computing resources. An ideal way to take full advantage of these two training modes is to switch between them upon the cluster status. However, switching training modes often requires tuning hyper-parameters, which is extremely time- and resource-consuming. We find two obstacles to a tuning-free approach: the different distribution of the gradient values and the stale gradients from the stragglers. This paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings