Learning to Optimize DAG Scheduling in Heterogeneous Environment
Jinhong Luo, Yunfan Zhou, Xijun Li, Mingxuan Yuan, Jianguo Yao, Jia, Zeng

TL;DR
This paper introduces achesis, a reinforcement learning and graph neural network-based algorithm for optimizing DAG job scheduling in heterogeneous data centers, achieving significant reductions in makespan and improvements in speedup.
Contribution
It presents a novel task-duplication based learning algorithm that perceives job dependencies and assigns tasks considering heterogeneity, outperforming existing methods.
Findings
Achieves up to 26.7% reduction in makespan.
Improves speedup ratio by 35.2%.
Outperforms seven baseline algorithms.
Abstract
Scheduling job flows efficiently and rapidly on distributed computing clusters is one of huge challenges for daily operation of data centers. In a practical scenario, a single job consists of numerous stages with complex dependency relation represented as a Directed Acyclic Graph (DAG) structure. Nowadays a data center usually equips with a cluster of heterogeneous computing servers which are different in the hardware/software configuration. From both the cost saving and environmental friendliness, the data centers could benefit a lot from optimizing the job scheduling problems in the heterogeneous environment. Thus the problem has attracted more and more attention from both the industry and academy. In this paper, we propose a task-duplication based learning algorithm, namely \lachesis \footnote{The second of the Three Fates in ancient Greek mythology, who determines destiny.}, aiming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · IoT and Edge/Fog Computing
