Lotaru: Locally Predicting Workflow Task Runtimes for Resource Management on Heterogeneous Infrastructures
Jonathan Bader, Fabian Lehmann, Lauritz Thamsen, Ulf Leser, Odej Kao

TL;DR
Lotaru is a novel local runtime prediction method for scientific workflow tasks on heterogeneous clusters that does not require historical data, using microbenchmarks and Bayesian regression to improve scheduling and resource management.
Contribution
It introduces a data-free, local prediction approach using microbenchmarks and Bayesian regression, addressing startup data scarcity and heterogeneity in scientific workflows.
Findings
Outperforms state-of-the-art prediction baselines by over 12.5%
Enables scheduling and optimization close to perfect knowledge
Reduces prediction error effectively in real-world workflows
Abstract
Many resource management techniques for task scheduling, energy and carbon efficiency, and cost optimization in workflows rely on a-priori task runtime knowledge. Building runtime prediction models on historical data is often not feasible in practice as workflows, their input data, and the cluster infrastructure change. Online methods, on the other hand, which estimate task runtimes on specific machines while the workflow is running, have to cope with a lack of measurements during start-up. Frequently, scientific workflows are executed on heterogeneous infrastructures consisting of machines with different CPU, I/O, and memory configurations, further complicating predicting runtimes due to different task runtimes on different machine types. This paper presents Lotaru, a method for locally predicting the runtimes of scientific workflow tasks before they are executed on heterogeneous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
