Sizey: Memory-Efficient Execution of Scientific Workflow Tasks
Jonathan Bader, Fabian Skalski, Fabian Lehmann, Dominik Scheinert,, Jonathan Will, Lauritz Thamsen, Odej Kao

TL;DR
Sizey is an online memory prediction system for scientific workflows that dynamically learns and selects the best model during execution, significantly reducing memory waste and improving resource utilization.
Contribution
It introduces a novel online machine learning approach for memory prediction in workflows, with a new RAQ score for model evaluation and continuous online retraining.
Findings
Median memory waste reduction of 24.68% with Sizey.
Effective online model selection improves resource efficiency.
Validated on six real-world scientific workflows.
Abstract
As the amount of available data continues to grow in fields as diverse as bioinformatics, physics, and remote sensing, the importance of scientific workflows in the design and implementation of reproducible data analysis pipelines increases. When developing workflows, resource requirements must be defined for each type of task in the workflow. Typically, task types vary widely in their computational demands because they are simply wrappers for arbitrary black-box analysis tools. Furthermore, the resource consumption for the same task type can vary considerably as well due to different inputs. Since underestimating memory resources leads to bottlenecks and task failures, workflow developers tend to overestimate memory resources. However, overprovisioning of memory wastes resources and limits cluster throughput. Addressing this problem, we propose Sizey, a novel online memory prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
