Trustworthy Scheduling for Big Data Applications

Dimitrios Tomaras; Vana Kalogeraki; Dimitrios Gunopulos

arXiv:2601.18983·cs.DC·January 28, 2026

Trustworthy Scheduling for Big Data Applications

Dimitrios Tomaras, Vana Kalogeraki, Dimitrios Gunopulos

PDF

Open Access

TL;DR

This paper introduces X-Sched, a middleware that enhances containerized big data application scheduling by providing explainable, actionable guidance on resource configurations to meet performance and SLOs.

Contribution

X-Sched uniquely combines explainability techniques with machine learning to improve scheduling transparency and decision-making in containerized environments.

Findings

01

X-Sched effectively guides resource configuration decisions.

02

It ensures task execution aligns with performance goals.

03

Experimental validation shows practical benefits.

Abstract

Recent advances in modern containerized execution environments have resulted in substantial benefits in terms of elasticity and more efficient utilization of computing resources. Although existing schedulers strive to optimize performance metrics like task execution times and resource utilization, they provide limited transparency into their decision-making processes or the specific actions developers must take to meet Service Level Objectives (SLOs). In this work, we propose X-Sched, a middleware that uses explainability techniques to generate actionable guidance on resource configurations that makes task execution in containerized environments feasible, under resource and time constraints. X-Sched addresses this gap by integrating counterfactual explanations with advanced machine learning models, such as Random Forests, to efficiently identify optimal configurations. This approach not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Big Data and Digital Economy