HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments
Ji Liu, Zhihua Wu, Dianhai Yu, Yanjun Ma, Danlei Feng, Minxu Zhang,, Xinxuan Wu, Xuefeng Yao, Dejing Dou

TL;DR
This paper introduces Paddle-HeterPS, a distributed deep learning framework that uses reinforcement learning to efficiently schedule layers across heterogeneous resources, significantly improving training throughput and reducing costs.
Contribution
The paper presents a novel RL-based scheduling method within a distributed framework for heterogeneous environments, enhancing training efficiency and cost-effectiveness.
Findings
Achieves 14.5x higher throughput than existing methods.
Reduces monetary cost by 312.3%.
Effectively manages data storage and communication.
Abstract
Deep neural networks (DNNs) exploit many layers and a large number of parameters to achieve excellent performance. The training process of DNN models generally handles large-scale input data with many sparse features, which incurs high Input/Output (IO) cost, while some layers are compute-intensive. The training process generally exploits distributed computing resources to reduce training time. In addition, heterogeneous computing resources, e.g., CPUs, GPUs of multiple types, are available for the distributed training process. Thus, the scheduling of multiple layers to diverse computing resources is critical for the training process. To efficiently train a DNN model using the heterogeneous computing resources, we propose a distributed framework, i.e., Paddle-Heterogeneous Parameter Server (Paddle-HeterPS), composed of a distributed architecture and a Reinforcement Learning (RL)-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data
