TL;DR
The paper introduces the Ensemble Toolkit, a scalable and flexible system for executing ensembles of tasks in scientific applications, supporting diverse execution patterns and efficient resource management on heterogeneous systems.
Contribution
It presents a novel ensemble-based execution framework with abstractions and a pilot-based runtime, enabling scalable and flexible task ensemble execution.
Findings
Linear weak and strong scaling up to 1000 ensembles and cores
Supports diverse ensemble execution patterns efficiently
Decouples workload execution from resource management
Abstract
There are many science applications that require scalable task-level parallelism and support for flexible execution and coupling of ensembles of simulations. Most high-performance system software and middleware, however, are designed to support the execution and optimization of single tasks. Motivated by the missing capabilities of these computing systems and the increasing importance of task-level parallelism, we introduce the Ensemble toolkit which has the following application development features: (i) abstractions that enable the expression of ensembles as primary entities, and (ii) support for ensemble-based execution patterns that capture the majority of application scenarios. Ensemble toolkit uses a scalable pilot-based runtime system that decouples workload execution and resource management details from the expression of the application, and enables the efficient and dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
