Towards Self-Tuning Parameter Servers

Chris Liu; Pengfei Zhang; Bo Tang; Hang Shen; Lei Zhu; Ziliang Lai,; Eric Lo

arXiv:1810.02935·cs.DB·August 5, 2020

Towards Self-Tuning Parameter Servers

Chris Liu, Pengfei Zhang, Bo Tang, Hang Shen, Lei Zhu, Ziliang Lai,, Eric Lo

PDF

TL;DR

This paper presents techniques for self-tuning parameter servers in machine learning systems, enabling online optimization of system settings to significantly reduce training times.

Contribution

It introduces a novel approach for self-tuning parameter servers that adapt system parameters during training to improve efficiency, demonstrated on TensorFlow.

Findings

01

Reduction of training time by up to 18x

02

Effective online adaptation of system settings

03

Generalizable techniques for PS-style ML systems

Abstract

Recent years, many applications have been driven advances by the use of Machine Learning (ML). Nowadays, it is common to see industrial-strength machine learning jobs that involve millions of model parameters, terabytes of training data, and weeks of training. Good efficiency, i.e., fast completion time of running a specific ML job, therefore, is a key feature of a successful ML system. While the completion time of a long-running ML job is determined by the time required to reach model convergence, practically that is also largely influenced by the values of various system settings. In this paper, we contribute techniques towards building self-tuning parameter servers. Parameter Server (PS) is a popular system architecture for large-scale machine learning systems; and by self-tuning we mean while a long-running ML job is iteratively training the expert-suggested model, the system is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.