Lotaru: Locally Predicting Workflow Task Runtimes for Resource   Management on Heterogeneous Infrastructures

Jonathan Bader; Fabian Lehmann; Lauritz Thamsen; Ulf Leser; Odej Kao

arXiv:2309.06918·cs.DC·September 14, 2023

Lotaru: Locally Predicting Workflow Task Runtimes for Resource Management on Heterogeneous Infrastructures

Jonathan Bader, Fabian Lehmann, Lauritz Thamsen, Ulf Leser, Odej Kao

PDF

TL;DR

Lotaru is a novel local runtime prediction method for scientific workflow tasks on heterogeneous clusters that does not require historical data, using microbenchmarks and Bayesian regression to improve scheduling and resource management.

Contribution

It introduces a data-free, local prediction approach using microbenchmarks and Bayesian regression, addressing startup data scarcity and heterogeneity in scientific workflows.

Findings

01

Outperforms state-of-the-art prediction baselines by over 12.5%

02

Enables scheduling and optimization close to perfect knowledge

03

Reduces prediction error effectively in real-world workflows

Abstract

Many resource management techniques for task scheduling, energy and carbon efficiency, and cost optimization in workflows rely on a-priori task runtime knowledge. Building runtime prediction models on historical data is often not feasible in practice as workflows, their input data, and the cluster infrastructure change. Online methods, on the other hand, which estimate task runtimes on specific machines while the workflow is running, have to cope with a lack of measurements during start-up. Frequently, scientific workflows are executed on heterogeneous infrastructures consisting of machines with different CPU, I/O, and memory configurations, further complicating predicting runtimes due to different task runtimes on different machine types. This paper presents Lotaru, a method for locally predicting the runtimes of scientific workflow tasks before they are executed on heterogeneous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.