Characterizing the Performance of Executing Many-tasks on Summit

Matteo Turilli; Andre Merzky; Thomas Naughton; Wael Elwasif; Shantenu; Jha

arXiv:1909.03057·cs.DC·September 10, 2019

Characterizing the Performance of Executing Many-tasks on Summit

Matteo Turilli, Andre Merzky, Thomas Naughton, Wael Elwasif, Shantenu, Jha

PDF

TL;DR

This paper evaluates the performance of executing many independent tasks on the Summit supercomputer using RADICAL-Pilot with JSM and PRRTE, highlighting scalability, overheads, and resource utilization improvements.

Contribution

It provides the first comprehensive performance characterization of RADICAL-Pilot with JSM and PRRTE on Summit for large-scale task workloads.

Findings

01

PRRTE scales better than JSM for over 1000 tasks

02

PRRTE overheads are negligible

03

Resource utilization reaches 63% with 16,000 tasks on 404 nodes

Abstract

Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.