On system-wide safety staffing of large-scale parallel server networks
Hassan Hmedi, Ari Arapostathis, Guodong Pang

TL;DR
This paper introduces the system-wide safety staffing (SWSS) parameter for large-scale parallel server networks, providing explicit formulas and conditions for stability and ergodicity in the Halfin-Whitt regime.
Contribution
It defines the SWSS parameter for complex networks, characterizes its properties, and establishes necessary and sufficient conditions for stability and ergodicity using graph theory and diffusion analysis.
Findings
SWSS parameter explicitly derived for various network topologies.
Negative SWSS leads to transience; positive SWSS ensures stability.
Existence of stabilizing policies with exponential ergodicity.
Abstract
We introduce a "system-wide safety staffing" (SWSS) parameter for multiclass multi-pool networks of any tree topology, Markovian or non-Markovian, in the Halfin-Whitt regime. This parameter can be regarded as the optimal reallocation of the capacity fluctuations (positive or negative) of order when each server pool employs a square-root staffing rule. We provide an explicit form of the SWSS as a function of the system parameters, which is derived using a graph theoretic approach based on Gaussian elimination. For Markovian networks, we give an equivalent characterization of the SWSS parameter via the drift parameters of the limiting diffusion. We show that if the SWSS parameter is negative, the limiting diffusion and the diffusion-scaled queueing processes are transient under any Markov control, and cannot have a stationary distribution when this parameter is zero. If it is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Queuing Theory Analysis · Network Traffic and Congestion Control · Cloud Computing and Resource Management
