Job Management and Task Bundling
Evan Berkowitz, Gustav R. Jansen, Kenneth McElvain, Andr\'e, Walker-Loud

TL;DR
This paper introduces software tools METAQ and mpi_jm that dynamically bundle computational tasks to optimize resource utilization in high-performance computing environments, especially for large-scale scientific workloads.
Contribution
It presents novel software solutions for task bundling and dynamic backfilling to improve HPC resource efficiency without disrupting existing workflows.
Findings
Effective task bundling increases resource utilization.
Software enables large job creation suitable for massive partitions.
Minimal workflow disruption during task grouping.
Abstract
High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We discuss METAQ and mpi_jm, software developed to dynamically group computational tasks together, that can intelligently backfill to consume idle time without substantial changes to users' current workflows or executables.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
