Best of Both Worlds: High Performance Interactive and Batch Launching
Chansup Byun, Jeremy Kepner, William Arcand, David Bestor, Bill, Bergeron, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones,, Andrew Kirby, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen,, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee

TL;DR
This paper introduces a preemptive scheduling method for MIT SuperCloud that enables high-performance interactive and batch job launching, significantly improving scheduling speed and system utilization without disrupting user experience.
Contribution
It presents a novel preemptive approach that separates preemption and scheduling, achieving 100x faster job scheduling and enabling efficient resource utilization for interactive and batch jobs.
Findings
Achieves 100 times faster job scheduling performance.
Allows preemptive scheduling without disrupting interactive user experience.
Increases overall system utilization significantly.
Abstract
Rapid launch of thousands of jobs is essential for effective interactive supercomputing, big data analysis, and AI algorithm development. Achieving thousands of launches per second has required hardware to be available to receive these jobs. This paper presents a novel preemptive approach to implement spot jobs on MIT SuperCloud systems allowing the resources to be fully utilized for both long running batch jobs while still providing fast launch for interactive jobs. The new approach separates the job preemption and scheduling operations and can achieve 100 times faster performance in the scheduling of a job with preemption when compared to using the standard scheduler-provided automatic preemption-based capability. The results demonstrate that the new approach can schedule interactive jobs preemptively at a performance comparable to when the required computing resources are idle and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
