Discovering Job Preemptions in the Open Science Grid
Zhe Zhang, Brian Bockelman, Derek Weitzel, David Swanson

TL;DR
This paper analyzes job preemptions in the Open Science Grid, characterizing patterns and classifying jobs to understand their runtime behaviors and improve scheduling strategies.
Contribution
It introduces a detailed analysis of preemption patterns in OSG, classifies jobs into five categories, and models runtime distributions for better resource management.
Findings
Preemptions are frequent and significantly delay job completion.
Jobs can be categorized into five distinct types based on their characteristics.
Different statistical distributions effectively model job runtime for each category.
Abstract
The Open Science Grid(OSG) is a world-wide computing system which facilitates distributed computing for scientific research. It can distribute a computationally intensive job to geo-distributed clusters and process job's tasks in parallel. For compute clusters on the OSG, physical resources may be shared between OSG and cluster's local user-submitted jobs, with local jobs preempting OSG-based ones. As a result, job preemptions occur frequently in OSG, sometimes significantly delaying job completion time. We have collected job data from OSG over a period of more than 80 days. We present an analysis of the data, characterizing the preemption patterns and different types of jobs. Based on observations, we have grouped OSG jobs into 5 categories and analyze the runtime statistics for each category. we further choose different statistical distributions to estimate probability density…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
