Toward Smart Scheduling in Tapis
Joe Stubbs, Smruti Padhy, Richard Cardone

TL;DR
This paper discusses developing an intelligent job scheduling system in Tapis that automatically determines job configurations and dynamically provisions resources, focusing on predicting queue times using machine learning.
Contribution
The paper introduces a novel architecture for smart scheduling in Tapis, emphasizing queue time prediction with machine learning for resource selection and dynamic provisioning.
Findings
Regression models can predict queue times to select optimal systems.
Classification models compare existing systems with dynamic resource provisioning.
Results demonstrate machine learning effectiveness in scheduling decisions.
Abstract
The Tapis framework provides APIs for automating job execution on remote resources, including HPC clusters and servers running in the cloud. Tapis can simplify the interaction with remote cyberinfrastructure (CI), but the current services require users to specify the exact configuration of a job to run, including the system, queue, node count, and maximum run time, among other attributes. Moreover, the remote resources must be defined and configured in Tapis before a job can be submitted. In this paper, we present our efforts to develop an intelligent job scheduling capability in Tapis, where various attributes about a job configuration can be automatically determined for the user, and computational resources can be dynamically provisioned by Tapis for specific jobs. We develop an overall architecture for such a feature, which suggests a set of core challenges to be solved. Then, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Manufacturing and Logistics Optimization
MethodsSparse Evolutionary Training · Focus
