"Two-Stagification": Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture

Mert Yildiz; Alexey Rolich; Andrea Baiocchi

arXiv:2505.03032·cs.DC·April 27, 2026

"Two-Stagification": Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture

Mert Yildiz, Alexey Rolich, Andrea Baiocchi

PDF

TL;DR

This paper proposes a two-stage cluster architecture that improves job dispatching efficiency by combining classical schemes with an optimized threshold, significantly enhancing performance over single-stage policies.

Contribution

Introduces a novel two-stage architecture that combines classical dispatching schemes with an optimized threshold, improving large-scale cluster performance.

Findings

01

Two-stage approach outperforms single-stage policies in simulations.

02

Close performance to size- and state-aware methods.

03

Significant reduction in mean response times.

Abstract

A continuing effort is devoted to devising effective dispatching policies for clusters of First Come First Served servers. Although the optimal solution for dispatchers aware of both job size and server state remains elusive, lower bounds and strong heuristics are known. In this paper, we introduce a two-stage cluster architecture that applies classical Round Robin, Join Idle Queue, and Least Work Left dispatching schemes, coupled with an optimized service-time threshold to separate large jobs from shorter ones. Using both synthetic (Weibull) workloads and real Google data center traces, we demonstrate that our two-stage approach greatly improves upon the corresponding single-stage policies and closely approaches the performance of advanced size- and state-aware methods. Our results highlight that careful architectural design-rather than increased complexity at the dispatcher-can yield…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.