Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

Georgios L. Stavrinides; Helen D. Karatza

arXiv:2510.25362·cs.DC·October 30, 2025·Modeling and Simulation in HPC and Cloud Systems

Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

Georgios L. Stavrinides, Helen D. Karatza

PDF

TL;DR

This paper reviews the trends and challenges in scheduling data-intensive workloads in large-scale distributed systems, highlighting the complexity, QoS requirements, and the need for effective scheduling strategies.

Contribution

It provides a classification of data-intensive workloads, surveys existing scheduling approaches, and discusses novel strategies and open challenges in the field.

Findings

01

Workloads vary in parallelism and data locality needs.

02

Scheduling must address QoS, energy efficiency, and fault tolerance.

03

Open challenges include scalability and adaptability of scheduling algorithms.

Abstract

With the explosive growth of big data, workloads tend to get more complex and computationally demanding. Such applications are processed on distributed interconnected resources that are becoming larger in scale and computational capacity. Data-intensive applications may have different degrees of parallelism and must effectively exploit data locality. Furthermore, they may impose several Quality of Service requirements, such as time constraints and resilience against failures, as well as other objectives, like energy efficiency. These features of the workloads, as well as the inherent characteristics of the computing resources required to process them, present major challenges that require the employment of effective scheduling techniques. In this chapter, a classification of data-intensive workloads is proposed and an overview of the most commonly used approaches for their scheduling in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.