WfBench: Automated Generation of Scientific Workflow Benchmarks
Tain\~a Coleman, Henri Casanova, Ketan Maheshwari, Lo\"ic Pottier,, Sean R. Wilkinson, Justin Wozniak, Fr\'ed\'eric Suter, Mallikarjun Shankar,, Rafael Ferreira da Silva

TL;DR
This paper introduces WfBench, a tool for automatically generating realistic scientific workflow benchmarks that mimic real-world workflows for evaluating workflow system performance on diverse computing platforms.
Contribution
WfBench provides a novel method for creating customizable, realistic workflow benchmarks with diverse performance traits and dependency structures, aiding system evaluation.
Findings
Generated benchmarks are representative of real production workflows.
Benchmarks effectively evaluate workflow system performance.
Case study demonstrates practical utility of the benchmarks.
Abstract
The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the deployment, monitoring, and optimization of workflow executions, many workflow systems have been developed over the past decade. There is a need for workflow benchmarks that can be used to evaluate the performance of workflow systems on current and future software stacks and hardware platforms. We present a generator of realistic workflow benchmark specifications that can be translated into benchmark code to be executed with current workflow systems. Our approach generates workflow tasks with arbitrary performance characteristics (CPU, memory, and I/O usage) and with realistic task dependency structures based on those seen in production workflows. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
