Bulk Scheduling with DIANA Scheduler
Ashiq Anjum, Richard McClatchey, Arshad Ali & Ian Willers

TL;DR
This paper presents DIANA Scheduler, a data and network-aware system that optimizes bulk scheduling for data-intensive scientific applications by considering data location, network performance, and compute resources, leading to significant performance gains.
Contribution
The paper introduces an adaptive, performance-aware, and economy-guided meta scheduler that effectively manages computation and data across multiple locations for bulk data-intensive tasks.
Findings
Significant performance improvements with DIANA scheduling
Effective management of data location and network performance
Suitable for both single and bulk job scheduling
Abstract
Results from and progress on the development of a Data Intensive and Network Aware (DIANA) Scheduling engine, primarily for data intensive sciences such as physics analysis, are described. Scientific analysis tasks can involve thousands of computing, data handling, and network resources and the size of the input and output files and the amount of overall storage space allocated to a user necessarily can have significant bearing on the scheduling of data intensive applications. If the input or output files must be retrieved from a remote location, then the time required transferring the files must also be taken into consideration when scheduling compute resources for the given application. The central problem in this study is the coordinated management of computation and data at multiple locations and not simply data movement. However, this can be a very costly operation and efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
