Mining the Workload of Real Grid Computing Systems
Marco Guazzone

TL;DR
This paper analyzes real grid computing workloads using data mining and Bayesian methods to develop and validate models for job interarrival times and runtimes, aiding in resource management.
Contribution
It introduces a novel combination of data mining and Bayesian techniques to model and understand grid workload characteristics.
Findings
Workload models accurately reflect real system patterns
Bayesian approach captures user correlations effectively
Models assist in designing better resource management strategies
Abstract
Since the mid 1990s, grid computing systems have emerged as an analogy for making computing power as pervasive an easily accessible as an electric power grid. Since then, grid computing systems have been shown to be able to provide very large amounts of storage and computing power to mainly support the scientific and engineering research on a wide geographic scale. Understanding the workload characteristics incoming to such systems is a milestone for the design and the tuning of effective resource management strategies. This is accomplished through the workload characterization, where workload characteristics are analyzed and a possibly realistic model for those is obtained. In this paper, we study the workload of some real grid systems by using a data mining approach to build a workload model for job interarrival time and runtime, and a Bayesian approach to capture user correlations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Age of Information Optimization
