Towards Self-Tuning Parameter Servers
Chris Liu, Pengfei Zhang, Bo Tang, Hang Shen, Lei Zhu, Ziliang Lai,, Eric Lo

TL;DR
This paper presents techniques for self-tuning parameter servers in machine learning systems, enabling online optimization of system settings to significantly reduce training times.
Contribution
It introduces a novel approach for self-tuning parameter servers that adapt system parameters during training to improve efficiency, demonstrated on TensorFlow.
Findings
Reduction of training time by up to 18x
Effective online adaptation of system settings
Generalizable techniques for PS-style ML systems
Abstract
Recent years, many applications have been driven advances by the use of Machine Learning (ML). Nowadays, it is common to see industrial-strength machine learning jobs that involve millions of model parameters, terabytes of training data, and weeks of training. Good efficiency, i.e., fast completion time of running a specific ML job, therefore, is a key feature of a successful ML system. While the completion time of a long-running ML job is determined by the time required to reach model convergence, practically that is also largely influenced by the values of various system settings. In this paper, we contribute techniques towards building self-tuning parameter servers. Parameter Server (PS) is a popular system architecture for large-scale machine learning systems; and by self-tuning we mean while a long-running ML job is iteratively training the expert-suggested model, the system is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
