Parallelization in Scientific Workflow Management Systems
Marc Bux, Ulf Leser

TL;DR
This paper reviews parallelization techniques in scientific workflow management systems, highlighting current limitations and proposing key advancements to improve performance, scalability, and resource utilization in data-intensive scientific computing.
Contribution
It provides a comprehensive overview of parallelization methods in SWfMS and suggests improvements for current systems to better handle large-scale scientific data processing.
Findings
Current SWfMS have significant room for improvement in parallelization.
Parallel execution and adaptive scheduling are essential for handling large data volumes.
Proposed advancements aim to enhance performance and resource efficiency.
Abstract
Over the last two decades, scientific workflow management systems (SWfMS) have emerged as a means to facilitate the design, execution, and monitoring of reusable scientific data processing pipelines. At the same time, the amounts of data generated in various areas of science outpaced enhancements in computational power and storage capabilities. This is especially true for the life sciences, where new technologies increased the sequencing throughput from kilobytes to terabytes per day. This trend requires current SWfMS to adapt: Native support for parallel workflow execution must be provided to increase performance; dynamically scalable "pay-per-use" compute infrastructures have to be integrated to diminish hardware costs; adaptive scheduling of workflows in distributed compute environments is required to optimize resource utilization. In this survey we give an overview of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Data Storage Technologies
