Towards General Distributed Resource Selection
Ming Tai Ha, Matteo Turilli, Andre Merzky, Shantenu Jha

TL;DR
This paper presents a general resource selection model for distributed computing that estimates task costs across heterogeneous resources, improving workload efficiency significantly.
Contribution
It introduces a resource-agnostic cost model integrated with Condor for effective resource selection in heterogeneous environments.
Findings
Cost estimation error: 157-171% on XSEDE, 18-31% on OSG
Resource selection reduces workload time-to-completion by up to ~85%
Model enables efficient scheduling of large-scale GROMACS simulations
Abstract
The advantages of distributing workloads and utilizing multiple distributed resources are now well established. The type and degree of heterogeneity of distributed resources is increasing, and thus determining how to distribute the workloads becomes increasingly difficult, in particular with respect to the selection of suitable resources. We formulate and investigate the resource selection problem in a way that it is agnostic of specific task and resource properties, and which is generalizable to range of metrics. Specifically, we developed a model to describe the requirements of tasks and to estimate the cost of running that task on an arbitrary resource using baseline measurements from a reference machine. We integrated our cost model with the Condor matchmaking algorithm to enable resource selection. Experimental validation of our model shows that it provides execution time estimates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
