How Workflow Engines Should Talk to Resource Managers: A Proposal for a Common Workflow Scheduling Interface
Fabian Lehmann, Jonathan Bader, Friedrich Tschirpke, Lauritz Thamsen,, Ulf Leser

TL;DR
This paper proposes a standardized REST API for better communication between scientific workflow management systems and resource managers, improving scheduling efficiency and system interoperability.
Contribution
It introduces a simple REST interface enabling dynamic information exchange, demonstrated with Nextflow and Kubernetes, leading to significant reductions in workflow makespan.
Findings
Up to 25.1% reduction in makespan.
Average reduction of 10.8% in workflow completion time.
Simplifies component exchange and implementation of new scheduling algorithms.
Abstract
Scientific workflow management systems (SWMSs) and resource managers together ensure that tasks are scheduled on provisioned resources so that all dependencies are obeyed, and some optimization goal, such as makespan minimization, is achieved. In practice, however, there is no clear separation of scheduling responsibilities between an SWMS and a resource manager because there exists no agreed-upon separation of concerns between their different components. This has two consequences. First, the lack of a standardized API to exchange scheduling information between SWMSs and resource managers hinders portability. It incurs costly adaptations when a component should be replaced by a different one (e.g., an SWMS with another SWMS on the same resource manager). Second, due to overlapping functionalities, current installations often actually have two schedulers, both making partial scheduling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
