Job Allocation in Large-Scale Service Systems with Affinity Relations

Ellen Cardinaels; Sem C. Borst; Johan S.H. van Leeuwaarden

arXiv:1812.10703·math.PR·December 31, 2018

Job Allocation in Large-Scale Service Systems with Affinity Relations

Ellen Cardinaels, Sem C. Borst, Johan S.H. van Leeuwaarden

PDF

Open Access

TL;DR

This paper studies load balancing in large-scale service systems with affinity relations, proposing schemes that allocate jobs to primary or secondary servers, and analyzing stability and performance through coupling and fluid limit techniques.

Contribution

It introduces load balancing schemes considering affinity relations and develops novel coupling methods for stability analysis and performance bounds.

Findings

01

Stability conditions depend on affinity and load parameters.

02

Fluid limit analysis reveals the impact of model parameters on performance.

03

Coupling construction provides bounds for system stability.

Abstract

We consider load balancing in service systems with affinity relations between jobs and servers. Specifically, an arriving job can be allocated to a fast, primary server from a particular selection associated with this job or to a secondary server to be processed at a slower rate. Such job-server affinity relations can model network topologies based on geographical proximity, or data locality in cloud scenarios. We introduce load balancing schemes that allocate jobs to primary servers if available, and otherwise to secondary servers. A novel coupling construction is developed to obtain stability conditions and performance bounds using a coupling technique. We also conduct a fluid limit analysis for symmetric model instances, which reveals a delicate interplay between the model parameters and load balancing performance.

Figures9

Click any figure to enlarge with its caption.

Tables2

Table 1. Table 1: The smallest possible value of d 𝑑 d that satisfies condition ( 9 ) is listed for a system with N = 50 𝑁 50 N=50 servers and a given value of k 𝑘 k .

$k$	2	3	4	5	10	15	25
$d$	31	34	36	38	42	44	46

Table 2. Table 2: For given λ 𝜆 \lambda , μ 1 = 1 subscript 𝜇 1 1 \mu_{1}=1 and μ 2 subscript 𝜇 2 \mu_{2} , the minimum value d 1 ∗ superscript subscript 𝑑 1 d_{1}^{*} that satisfies condition ( 17 ) is listed.

$λ$	0.4	0.5	0.6	0.7	0.8	0.9
$μ_{2} = 1 / 2$	$/$	$/$	5	9	18	46
$μ_{2} = 1 / 3$	3	5	7	12	22	54

Equations142

\overline{Q}_{ij}^{N} (t) ≐ k \geq i \sum Q_{k j}^{N} (t)

\overline{Q}_{ij}^{N} (t) ≐ k \geq i \sum Q_{k j}^{N} (t)

\overline{Q}_{m +}^{aff} (t) ≐ i = m \sum \infty \overline{Q}_{i}^{aff} (t) and \overline{Q}_{m +}^{ref} (t) ≐ i = m \sum \infty \overline{Q}_{i}^{ref} (t),

\overline{Q}_{m +}^{aff} (t) ≐ i = m \sum \infty \overline{Q}_{i}^{aff} (t) and \overline{Q}_{m +}^{ref} (t) ≐ i = m \sum \infty \overline{Q}_{i}^{ref} (t),

{(\overline{Q}_{m +}^{aff} (t))_{m \geq 1}}_{t \geq 0} \leq_{st} {(\overline{Q}_{m +}^{ref} (t))_{m \geq 1}}_{t \geq 0} .

{(\overline{Q}_{m +}^{aff} (t))_{m \geq 1}}_{t \geq 0} \leq_{st} {(\overline{Q}_{m +}^{ref} (t))_{m \geq 1}}_{t \geq 0} .

i = m \sum \infty \overline{Q}_{i}^{aff} (t) \leq i = m \sum \infty \overline{Q}_{i}^{ref} (t)

i = m \sum \infty \overline{Q}_{i}^{aff} (t) \leq i = m \sum \infty \overline{Q}_{i}^{ref} (t)

λ_{0} ≐ p_{S n} min {n max {λ_{n}^{*} = S \in S : n \in S \sum λ_{S} p_{S n} ∣ n \in S \sum p_{S n} = 1 with p_{S n} \geq 0, \forall n \in S}} .

λ_{0} ≐ p_{S n} min {n max {λ_{n}^{*} = S \in S : n \in S \sum λ_{S} p_{S n} ∣ n \in S \sum p_{S n} = 1 with p_{S n} \geq 0, \forall n \in S}} .

λ_{0} N n \in S \sum \frac{1}{N} \frac{λ _{n}^{*}}{λ _{0}} \frac{λ _{S} p _{S n}^{*}}{λ _{n}^{*}}

λ_{0} N n \in S \sum \frac{1}{N} \frac{λ _{n}^{*}}{λ _{0}} \frac{λ _{S} p _{S n}^{*}}{λ _{n}^{*}}

{(\overline{Q}_{m +}^{aff} (t))_{m \geq 1}}_{t \geq 0} \leq_{st} {(\overline{Q}_{m +}^{RA} (t))_{m \geq 1}}_{t \geq 0}

{(\overline{Q}_{m +}^{aff} (t))_{m \geq 1}}_{t \geq 0} \leq_{st} {(\overline{Q}_{m +}^{RA} (t))_{m \geq 1}}_{t \geq 0}

λ N < μ_{1} (N - k),

λ N < μ_{1} (N - k),

{(\overline{Q}_{m +}^{aff} (t))_{m \geq 1}}_{t \geq 0} \leq_{st} {(\overline{Q}_{m +}^{MJSQ(k)} (t))_{m \geq 1}}_{t \geq 0} .

{(\overline{Q}_{m +}^{aff} (t))_{m \geq 1}}_{t \geq 0} \leq_{st} {(\overline{Q}_{m +}^{MJSQ(k)} (t))_{m \geq 1}}_{t \geq 0} .

i = 1 \sum N - d - 1 (k - 1 N - i) < \frac{d + 1}{N} (k N) .

i = 1 \sum N - d - 1 (k - 1 N - i) < \frac{d + 1}{N} (k N) .

f_{\mathrm{ref}}\colon\{1,\dots,N\}\to[0,1]\colon x\mapsto\left\{\begin{array}[]{lcr}\frac{1}{\binom{N}{k}}\sum_{i=1}^{x}\binom{N-i}{k-1},&{}\hfil&1\leq x\leq N-k+1,\\ 1,&{}\hfil&N-k+1<x\leq N.\end{array}\right.

f_{\mathrm{ref}}\colon\{1,\dots,N\}\to[0,1]\colon x\mapsto\left\{\begin{array}[]{lcr}\frac{1}{\binom{N}{k}}\sum_{i=1}^{x}\binom{N-i}{k-1},&{}\hfil&1\leq x\leq N-k+1,\\ 1,&{}\hfil&N-k+1<x\leq N.\end{array}\right.

\begin{array}[]{rl}&f_{\mathrm{aff}}\colon\{1,\dots,N\}\to[0,1]\\ &x\mapsto\left\{\begin{array}[]{lcr}\frac{d+1}{N},&{}\hfil&1\leq x\leq N-d-\lceil\frac{N}{d+1}\rceil+2\\ \frac{d+1}{N}+\frac{1}{N}\left(N-\lfloor\frac{N}{d+1}\rfloor(d+1)\right),&{}\hfil&\frac{N}{d+1}\in\mathbb{N}\text{~{}and~{}}x=N-d-\lfloor\frac{N}{d+1}\rfloor+2\\ 1-\frac{d+1}{N}(N-d-x),&{}\hfil&N-d-\lfloor\frac{N}{d+1}\rfloor+2\leq x<N-d\\ 1,&{}\hfil&N-d\leq x\leq N.\end{array}\right.\end{array}

\begin{array}[]{rl}&f_{\mathrm{aff}}\colon\{1,\dots,N\}\to[0,1]\\ &x\mapsto\left\{\begin{array}[]{lcr}\frac{d+1}{N},&{}\hfil&1\leq x\leq N-d-\lceil\frac{N}{d+1}\rceil+2\\ \frac{d+1}{N}+\frac{1}{N}\left(N-\lfloor\frac{N}{d+1}\rfloor(d+1)\right),&{}\hfil&\frac{N}{d+1}\in\mathbb{N}\text{~{}and~{}}x=N-d-\lfloor\frac{N}{d+1}\rfloor+2\\ 1-\frac{d+1}{N}(N-d-x),&{}\hfil&N-d-\lfloor\frac{N}{d+1}\rfloor+2\leq x<N-d\\ 1,&{}\hfil&N-d\leq x\leq N.\end{array}\right.\end{array}

x = N - d - (⌈ \frac{N}{d + 1} ⌉ - 1) .

x = N - d - (⌈ \frac{N}{d + 1} ⌉ - 1) .

{(\overline{Q}_{m +}^{aff} (t))_{m \geq 1}}_{t \geq 0} \leq_{st} {(\overline{Q}_{m +}^{JSQ(k)} (t))_{m \geq 1}}_{t \geq 0} .

{(\overline{Q}_{m +}^{aff} (t))_{m \geq 1}}_{t \geq 0} \leq_{st} {(\overline{Q}_{m +}^{JSQ(k)} (t))_{m \geq 1}}_{t \geq 0} .

(\frac{Q ˉ _{i, j}^{N} ( t )}{N})_{i, j},

(\frac{Q ˉ _{i, j}^{N} ( t )}{N})_{i, j},

\tilde{λ} = (λ - μ_{1} q_{10} - μ_{2} q_{01})^{+} .

\tilde{λ} = (λ - μ_{1} q_{10} - μ_{2} q_{01})^{+} .

⎩ ⎨ ⎧ \frac{d}{d t} \overline{q}_{00} = μ_{2} (\overline{q}_{01} - \overline{q}_{11}) - λ (1 - q_{00})^{d_{1}} + \tilde{λ} \mathds 1 {q_{00} = 0} \frac{d}{d t} \overline{q}_{01} = μ_{2} (\overline{q}_{11} - \overline{q}_{01}) + \mathds 1 {q_{00} > 0} [λ (1 - q_{00})^{d_{1}}] + \mathds 1 {q_{00} = 0} [λ - \tilde{λ}] \frac{d}{d t} \overline{q}_{10} = μ_{1} (\overline{q}_{20} - \overline{q}_{10}) + \mathds 1 {q_{00} > 0} [λ (1 - (1 - q_{00})^{d_{1}})] \frac{d}{d t} \overline{q}_{11} = μ_{1} (\overline{q}_{21} - \overline{q}_{11}) + \tilde{λ} \mathds 1 {q_{00} = 0} [(\overline{q}_{10} + \overline{q}_{01})^{d_{1}} - (\overline{q}_{10} + \overline{q}_{11})^{d_{1}}] for i \geq 2, \frac{d}{d t} \overline{q}_{i 0} = μ_{1} (\overline{q}_{i + 1, 0} - \overline{q}_{i 0}) + \tilde{λ} \mathds 1 {q_{00} = 0} [(\overline{q}_{i - 1, 0} + \overline{q}_{i - 1, 1})^{d_{1}} - (\overline{q}_{i 0} + \overline{q}_{i - 1, 1})^{d_{1}}] \frac{d}{d t} \overline{q}_{i 1} = μ_{1} (\overline{q}_{i + 1, 1} - \overline{q}_{i 1}) + \tilde{λ} \mathds 1 {q_{00} = 0} [(\overline{q}_{i 0} + \overline{q}_{i - 1, 1})^{d_{1}} - (\overline{q}_{i 0} + \overline{q}_{i 1})^{d_{1}}]

⎩ ⎨ ⎧ \frac{d}{d t} \overline{q}_{00} = μ_{2} (\overline{q}_{01} - \overline{q}_{11}) - λ (1 - q_{00})^{d_{1}} + \tilde{λ} \mathds 1 {q_{00} = 0} \frac{d}{d t} \overline{q}_{01} = μ_{2} (\overline{q}_{11} - \overline{q}_{01}) + \mathds 1 {q_{00} > 0} [λ (1 - q_{00})^{d_{1}}] + \mathds 1 {q_{00} = 0} [λ - \tilde{λ}] \frac{d}{d t} \overline{q}_{10} = μ_{1} (\overline{q}_{20} - \overline{q}_{10}) + \mathds 1 {q_{00} > 0} [λ (1 - (1 - q_{00})^{d_{1}})] \frac{d}{d t} \overline{q}_{11} = μ_{1} (\overline{q}_{21} - \overline{q}_{11}) + \tilde{λ} \mathds 1 {q_{00} = 0} [(\overline{q}_{10} + \overline{q}_{01})^{d_{1}} - (\overline{q}_{10} + \overline{q}_{11})^{d_{1}}] for i \geq 2, \frac{d}{d t} \overline{q}_{i 0} = μ_{1} (\overline{q}_{i + 1, 0} - \overline{q}_{i 0}) + \tilde{λ} \mathds 1 {q_{00} = 0} [(\overline{q}_{i - 1, 0} + \overline{q}_{i - 1, 1})^{d_{1}} - (\overline{q}_{i 0} + \overline{q}_{i - 1, 1})^{d_{1}}] \frac{d}{d t} \overline{q}_{i 1} = μ_{1} (\overline{q}_{i + 1, 1} - \overline{q}_{i 1}) + \tilde{λ} \mathds 1 {q_{00} = 0} [(\overline{q}_{i 0} + \overline{q}_{i - 1, 1})^{d_{1}} - (\overline{q}_{i 0} + \overline{q}_{i 1})^{d_{1}}]

⎩ ⎨ ⎧ \overline{q}_{i 0}^{*} = 0, \overline{q}_{i 1}^{*} = (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{\frac{d _{1}^{i} - 1}{d _{1} - 1}}, i = 0, 1, 2 \dots .

⎩ ⎨ ⎧ \overline{q}_{i 0}^{*} = 0, \overline{q}_{i 1}^{*} = (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{\frac{d _{1}^{i} - 1}{d _{1} - 1}}, i = 0, 1, 2 \dots .

\begin{array}[]{rcl}d_{1}^{*}\lambda\left(\frac{1}{\mu_{2}}-\frac{1}{\mu_{1}}\right)&>&1,\\ \left(1-\frac{1}{d_{1}^{*}}\right)\frac{\mu_{1}}{\lambda}&>&\left(d_{1}^{*}\lambda\left(\frac{1}{\mu_{2}}-\frac{1}{\mu_{1}}\right)\right)^{\frac{1}{d_{1}^{*}-1}}.\end{array}

\begin{array}[]{rcl}d_{1}^{*}\lambda\left(\frac{1}{\mu_{2}}-\frac{1}{\mu_{1}}\right)&>&1,\\ \left(1-\frac{1}{d_{1}^{*}}\right)\frac{\mu_{1}}{\lambda}&>&\left(d_{1}^{*}\lambda\left(\frac{1}{\mu_{2}}-\frac{1}{\mu_{1}}\right)\right)^{\frac{1}{d_{1}^{*}-1}}.\end{array}

(1 - \frac{1}{d _{1}}) (\frac{1}{d _{1} a})^{\frac{1}{d _{1} - 1}} > \frac{λ}{μ _{1}}

(1 - \frac{1}{d _{1}}) (\frac{1}{d _{1} a})^{\frac{1}{d _{1} - 1}} > \frac{λ}{μ _{1}}

\overline{q}_{i}^{*} = (\frac{λ ~}{μ _{1}})^{\frac{d _{1}^{i} - 1}{d _{1} - 1}} = (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{\frac{d _{1}^{i} - 1}{d _{1} - 1}},

\overline{q}_{i}^{*} = (\frac{λ ~}{μ _{1}})^{\frac{d _{1}^{i} - 1}{d _{1} - 1}} = (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{\frac{d _{1}^{i} - 1}{d _{1} - 1}},

⎩ ⎨ ⎧ \overline{q}_{i 0}^{*} = 0, \overline{q}_{i 1}^{*} = (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{i}, i = 0, 1, 2, \dots .

⎩ ⎨ ⎧ \overline{q}_{i 0}^{*} = 0, \overline{q}_{i 1}^{*} = (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{i}, i = 0, 1, 2, \dots .

q_{00}^{*} = \frac{( μ _{2} - λ ) μ _{1}}{( μ _{2} - λ ) μ _{1} + λ μ _{2}} .

q_{00}^{*} = \frac{( μ _{2} - λ ) μ _{1}}{( μ _{2} - λ ) μ _{1} + λ μ _{2}} .

E [Q_{CM(d_{1})}] = i \geq 1 \sum i q_{i, 1} = i \geq 1 \sum (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{\frac{d _{1}^{i} - 1}{d _{1} - 1}} .

E [Q_{CM(d_{1})}] = i \geq 1 \sum i q_{i, 1} = i \geq 1 \sum (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{\frac{d _{1}^{i} - 1}{d _{1} - 1}} .

E [Q_{JSQ(d_{1})}] = i \geq 1 \sum i q_{i + 1} = i \geq 1 \sum (\frac{λ}{μ _{1}})^{\frac{d _{1}^{i + 1} - 1}{d _{1} - 1}},

E [Q_{JSQ(d_{1})}] = i \geq 1 \sum i q_{i + 1} = i \geq 1 \sum (\frac{λ}{μ _{1}})^{\frac{d _{1}^{i + 1} - 1}{d _{1} - 1}},

E [Q_{RA}] = (1 - \frac{λ}{μ _{1}}) i \geq 1 \sum i (\frac{λ}{μ _{1}})^{i + 1} = \frac{( λ / μ _{1} ) ^{2}}{1 - λ / μ _{1}} .

E [Q_{RA}] = (1 - \frac{λ}{μ _{1}}) i \geq 1 \sum i (\frac{λ}{μ _{1}})^{i + 1} = \frac{( λ / μ _{1} ) ^{2}}{1 - λ / μ _{1}} .

\frac{λ - λ ~}{λ} = \frac{μ _{2}}{λ} \frac{μ _{1} - λ}{μ _{1} - μ _{2}}

\frac{λ - λ ~}{λ} = \frac{μ _{2}}{λ} \frac{μ _{1} - λ}{μ _{1} - μ _{2}}

E [Q_{I}] = i \geq 1 \sum i q_{i + 1, 1} = i \geq 1 \sum (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{\frac{d _{1}^{i + 1} - 1}{d _{1} - 1}} .

E [Q_{I}] = i \geq 1 \sum i q_{i + 1, 1} = i \geq 1 \sum (\frac{λ - μ _{2}}{μ _{1} - μ _{2}})^{\frac{d _{1}^{i + 1} - 1}{d _{1} - 1}} .

E [W] = \frac{λ ~}{λ} E [W_{I}] + \frac{λ - λ ~}{λ} E [W_{II}] .

E [W] = \frac{λ ~}{λ} E [W_{I}] + \frac{λ - λ ~}{λ} E [W_{II}] .

E [W_{II}] = \frac{1}{λ - λ ~} (E [Q_{CM (d_{1})}] - E [Q_{I}]) = \frac{1}{λ - λ ~} \frac{λ - μ _{2}}{μ _{1} - μ _{2}} .

E [W_{II}] = \frac{1}{λ - λ ~} (E [Q_{CM (d_{1})}] - E [Q_{I}]) = \frac{1}{λ - λ ~} \frac{λ - μ _{2}}{μ _{1} - μ _{2}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Queuing Theory Analysis · Simulation Techniques and Applications · Cloud Computing and Resource Management

Full text

Job Allocation in Large-Scale Service Systems with Affinity Relations

Ellen Cardinaels Corresponding author: Ellen Cardinaels ([email protected]) Eindhoven University of Technology, The Netherlands

Sem C. Borst

Eindhoven University of Technology, The Netherlands

Nokia Bell Labs, Murray Hill, USA

Johan S.H. van Leeuwaarden

Eindhoven University of Technology, The Netherlands

Abstract

We consider load balancing in service systems with affinity relations between jobs and servers. Specifically, an arriving job can be allocated to a fast, primary server from a particular selection associated with this job or to a secondary server to be processed at a slower rate. Such job-server affinity relations can model network topologies based on geographical proximity, or data locality in cloud scenarios. We introduce load balancing schemes that allocate jobs to primary servers if available, and otherwise to secondary servers. A novel coupling construction is developed to obtain stability conditions and performance bounds using a coupling technique. We also conduct a fluid limit analysis for symmetric model instances, which reveals a delicate interplay between the model parameters and load balancing performance.

Keywords load balancing, stochastic coupling, fluid limit, job scheduling, network topology

MSC2010 60K25, 68M20, 90B15, 90B22, 90B35

1 Introduction

In this paper we analyze a load balancing scheme in a service system where particular servers are better equipped to process certain jobs because of affinity or compatibility relations. Load balancing algorithms play a crucial role in distributing jobs among multiple servers and have attracted strong renewed interest due to proliferation of large data centers and cloud computing. Well-known load balancing algorithms include for instance the Join-the-Shortest-Queue (JSQ), Join-the-Shortest-Queue- $d$ (JSQ( $d$ )) and Join-the-Idle-Queue (JIQ) policies. These policies have been extensively analyzed in an overarching framework called the supermarket model, consisting of a single dispatcher where jobs arrive that must be distributed among $N$ identical parallel servers. The JSQ policy assigns each arriving job to the server with the smallest queue length and has strong stochastic optimality properties among the class of policies without advance knowledge about the service requirements [5, 23]. The JSQ policy involves a significant communication burden however, which may be prohibitive in large systems.

This scalability issue has spurred an interest in the JSQ( $d$ ) policy which assigns a job to the server with the smallest queue length among $d\geq 2$ randomly selected servers. Mitzenmacher [14] and Vvedenskaya et al. [22] analyzed the JSQ( $d$ ) policy in an asymptotic regime where the total arrival rate and the number of servers grow large in proportion. Substantial performance gains were established compared to purely random assignment, even for $d=2$ . Mukherjee et al. [15] show that the waiting time in fact vanishes when $d$ tends to infinity as the number of servers grows large. A vanishing waiting time is also achieved by the JIQ policy which directs arriving jobs to an idle server or a randomly selected server if all servers are occupied [12]. The JIQ policy only has a constant communication overhead per job, but requires memory at the dispatcher. We refer to Van der Boor et al. [21] and Gamarnik et al. [6] for further details.

A key feature of the supermarket framework is the exchangeability of the servers in the sense that any job can be handled equally well by any server, which is often not the case in practice. In the present paper we will focus on a scenario where jobs or servers are not intrinsically different, but where particular servers might be better equipped to process certain jobs because of affinity or compatibility relations. Such affinity relations may for example arise due to geographical proximity in spatial settings, or data locality in content distribution or transaction processing applications. The scenario will be modeled as follows: let $\mathcal{P}(\{1,\dots,N\})$ denote the power set of all servers. Then for a selection of servers, $S\in\mathcal{S}\subseteq\mathcal{P}(\{1,\dots,N\})$ , jobs arrive at rate $\lambda_{S}\geq 0$ . These jobs can be processed at rate $\mu_{1}>0$ at any of the servers in $S$ or at rate $\mu_{2}$ at any of the servers in $S^{c}$ , with $\mu_{1}>\mu_{2}>0$ . The arriving job is then labeled as a type I or type II job, depending on whether it can be served at rate $\mu_{1}$ or $\mu_{2}$ , respectively. Our affinity-scheduling policy allocates the new job to a server in $S$ with the shortest queue length unless it might be beneficial to redirect the job to a server outside $S$ . The precise allocation and scheduling strategies will be described in Section 2.

When $\mathcal{S}$ contains all neighborhood sets of a graph $G_{N}$ on $N$ vertices, we refer to our model as the graph model. The graph model extends the models constructed by Gast [7], Turner [20] and Mukherjee et al. [15]. In these settings it is assumed that all nodes have equal arrival rates and jobs can only be forwarded to direct neighbors; it is not possible to redirect an arriving job to any other nodes. The model constructed by Yekkehkhany et al. [24] does allow for jobs to be redirected to higher-degree neighbors to be served at lower rates. When $\mathcal{S}$ consists of all subsets of servers of fixed size $d$ , we refer to our model as the combinatorial model. In addition, the arrival rates will be equal among the server selections which strengthens the symmetric nature of the combinatorial model.

The lack of exchangeability among the servers makes the affinity-scheduling model complicated to analyze in general. The analytical techniques that are most commonly used in the context of the supermarket model, such as mean-field limits and even standard coupling arguments, fundamentally rely on this feature. These techniques can only be applied for the combinatorial model. For the general model, and in particular the graph model, the investigation of load balancing issues is challenging, and enters largely uncharted methodological territory.

We will establish a stochastic dominance result for the occupancy process of the general affinity-scheduling model, which will yield a sufficient stability constraint as an immediate by-product. Exploiting the coupling of this dominance result, we can derive two stronger dominance results for the graph model, which will in particular hold if the underlying graph structure is rather dense. To the best of our knowledge, these are the first results that explicitly capture the impact of network structure on load balancing performance.

For the combinatorial model we will conduct a fluid-limit analysis. A trajectory of the fluid limit will converge to one of the possibly multiple fixed points, depending on the mutual relationships of the model parameters and the initial configuration of the system. When the fixed point is unique, we demonstrate that this provides a good approximation for the intractable stationary distribution in a finite server setting. When multiple fixed points occur, we observe the phenomenon of ‘tunneling’ described by [8]. The stochastic process will switch between multiple modes corresponding to the locally stable fixed points of the fluid limit.

The remainder of this paper is organized as follows. A detailed model description will be provided in Section 2. Next, the main stochastic dominance result is presented in Section 3 together with the coupling that establishes this result. This section is completed with two stronger stochastic dominance properties for graph models. In Section 4 we present a fluid-limit analysis of the affinity-scheduling policy for combinatorial models. The proofs of all results are deferred to Section 5. Finally, Section 6 provides concluding remarks and some directions for further research.

2 Model description

We now describe the affinity-scheduling model with $N$ servers. For a selection $S\in\mathcal{P}(\{1,\dots,N\})$ jobs arrive as a Poisson process of rate $\lambda_{S}\geq 0$ . For these jobs, the servers in $S$ and $S^{c}$ are called the primary and secondary servers, respectively. An arriving job can be allocated as a type I job to a primary server or as a type II job to a secondary server. Type I jobs have independent and exponentially distributed service times with parameter $\mu_{1}$ . Type II jobs have on average longer service times which are independent and exponentially distributed with parameter $\mu_{2}<\mu_{1}$ . It is important to note that the job type is not predetermined on arrival, but established by the allocation strategy. The main idea behind our allocation strategy is: ‘Allocate a job to a server in the primary selection unless it might be beneficial to allocate a job to a secondary server even though the service time might be longer’. The rationale for this is to reduce the waiting time of a job. More precisely, the allocation strategy goes through the following three steps:

Is there at least one completely idle server in the primary selection $S$ ? If so, allocate the arriving job as a type I job to one of these servers. 2. 2.

Is there at least one completely idle server in the secondary selection $S^{c}$ ? If so, allocate the arriving job as a type II job to one of these servers. 3. 3.

If there are no idle servers present, then allocate the job as a type I job to the primary server with the smallest number of type I jobs. Ties are broken according to the number of type II jobs, in favor of a lower number.

When the second step is omitted, our policy resembles a JSQ( $|S|$ ) policy with $|S|$ the cardinality of the primary selection. However, the cardinality of the server selection is allowed to differ among arriving jobs in our model and the server selection $S$ itself is not sampled uniformly at random as is the case in a JSQ( $|S|$ ) policy. Moreover, the second step can be related to a JIQ policy on the set of secondary servers. Our affinity-scheduling policy thus shares similarities with both policies. We assume the time it takes for the dispatcher to make a decision for an arriving job, just as the possible time it takes a job to reach its selected server, to be negligible. Notice that due to this strategy, an arriving job will never be assigned as a type II job to a server that already has a job in its queue. Denote the configuration of a server, i.e. the number of type I and type II jobs in its queue, by $(i,j)$ , $i,j\geq 0$ . As an illustrative example of the allocation strategy, the primary and secondary servers have $\{(1,0),(1,1),(1,1),(4,0)\}$ and $\{(1,0),(1,1),(1,1),(3,1)\}$ as their configurations, respectively. Due to the allocation strategy, the third step will be applied and the primary server with configuration $(1,0)$ will receive an additional type I job. In general, type I jobs are the preferred type of jobs, which also manifests itself in the scheduling strategy. Each server operates under a preemptive priority scheduling discipline in favor of the type I jobs. Moreover, type I jobs are served in order of arrival.

Let $N_{n,j}(t)$ denote the number of type $j$ jobs at server $n\in\{1,\dots,N\}$ at time $t$ . The configuration of server $n$ is then given by $\left(N_{n,\text{I}}(t),N_{n,\text{II}}(t)\right)\in\mathbb{N}^{2}$ with state space $\mathbb{N}^{2N}$ . The vector $\left(N_{n,\text{I}}(t),N_{n,\text{II}}(t)\right)_{n}$ evolves as an irreducible, time-homogeneous Markov process. We also introduce different variables that are more server-centric and will be more convenient in proving stochastic dominance and analyzing the fluid limit. Define $Q^{N}_{ij}(t)$ as the number of servers with $i$ type I jobs and $j$ type II jobs at time $t$ , with $i,j\geq 0$ . Then

[TABLE]

denotes the number of servers with at least $i$ type I jobs and exactly $j$ type II jobs. We note that $\sum_{j\geq 0}\overline{Q}^{N}_{0j}(t)=N$ by definition. It is important to note that these variables will no longer lead to a Markov process representation in the general settings mentioned in the introduction. This immediately limits the number of available techniques to analyze the performance.

Let $\mathcal{S}$ denote the subset of $\mathcal{P}(\{1,\dots,N\})$ with strictly positive arrival rates. Besides the general setting where $\mathcal{S}$ can be any subset of $\mathcal{P}(\{1,\dots,N\})$ we will also investigate some more restricted cases. In the graph model on graph topology $G_{N}$ , each node represents a server and the edges represent underlying relations between them. Then each set in $\mathcal{S}$ consists of a server and its neighbors determined by $G_{N}$ . In total $\mathcal{S}$ contains $N$ sets and jobs arrive to each of these sets independently at a uniform rate $\lambda$ . This setting mimics a situation where a job’s physical arrival location plays a role in its service time at the various servers.

Let $\mathcal{S}$ consist of all possible server selections of size $d$ . The cardinality of $\mathcal{S}$ is $\binom{N}{d}$ and henceforth we refer to this model as the combinatorial model. We assume a uniform arrival rate $\nu$ per selection. We let $\nu=\lambda N/\binom{N}{d}$ per selection such that the total rate is given by $\lambda N$ . Observe that the combinatorial model captures the situation where a selection of $d$ servers is drawn uniformly at random as the primary selection for each job and arrivals occur at rate $\lambda N$ in total.

Remark 2.1

There are also instances of the affinity-scheduling model that are not captured by either the graph model or the combinatorial model. As an example, suppose a job arrives to a primary selection that consists of the servers $1,\dots,5$ or an arbitrary selection of size two of the remaining servers. Then $\mathcal{S}$ consists of $\{1,\dots,5\}$ and all pairs of servers of $6,\dots,N$ . For some $\nu>0$ , the arrival rates per selection are given by $\nu/2$ and $\nu/\left((N-5)(N-6)\right)$ , respectively.

3 Stochastic dominance and coupling

In this section we establish several stochastic dominance results for our affinity-scheduling strategy. We will construct a stochastic coupling that allows a comparison with various reference systems in terms of the ordered server states, and refer to this coupling as the affinity coupling. In contrast to the original system with affinity relations, the various reference systems all involve $N$ exchangeable servers, and are therefore far more amenable to (asymptotic) analysis, yielding tractable performance bounds. We will not explicitly consider the type II jobs since our allocation strategy will never add such a job at any server where there are already type II jobs present and we assume that the initial configuration of the system will only have a finite number of type II jobs. Thus we focus on the type I jobs that enjoy higher affinity at each of the servers, hence the name of the coupling.

While each of the $N$ servers in the reference system processes jobs in a FCFS manner at rate $\mu_{1}$ , the various specific incarnations differ in the value of the normalized arrival rate per server $\lambda_{0}$ and the policy for assigning jobs. The choice of the specific reference system is aligned with the properties of the original system in terms of the server selections ${\mathcal{S}}$ and the associated arrival rates $\lambda_{S}$ , $S\in{\mathcal{S}}$ . Loosely speaking, we obtain increasingly strong dominance results under increasingly restrictive symmetry and structural conditions on the server selections ${\mathcal{S}}$ and the associated arrival rates. The three specific variants for the reference system that we consider operate under either (i) a purely random assignment (RA) policy, (ii) a MJSQ( $k$ ) policy (as specified later), or (iii) a JSQ( $k$ ) policy (as described in the introduction). While the RA system provides exact upper bounds in terms of independent M/M/1 queues, the MJSQ( $k$ ) and JSQ( $k$ ) systems yield asymptotic upper bounds based on fluid limits.

The dominance results revolve around stochastic majorization properties in terms of the ordered server states. Specifically, define

[TABLE]

with $\overline{Q}_{i}^{\mathrm{aff}}(t)$ and $\overline{Q}_{i}^{\mathrm{ref}}(t)$ denoting the number of servers with at least $i$ type I jobs in their queue at time $t$ in the original system and the reference system, respectively. We will establish results of the form

[TABLE]

This majorization result indicates that the number of type I jobs residing in the $m$ -th or higher queue position in the original system is stochastically bounded from above by the number of jobs residing in the $m$ -th or higher queue position in the reference system. In particular, taking $m=1$ , this implies that the total number of type I jobs in the original system is stochastically bounded from above by the total number of jobs in the reference system. As noted earlier, we know the exact distribution of the latter quantity in the RA system and have an asymptotic result for the MJSQ( $k$ ) and JSQ( $k$ ) systems.

In order to prove the stochastic majorization properties, we introduce the affinity coupling to construct sample paths for the original and reference systems on a joint probability space for which the stated inequalities hold in a deterministic way [11, 18, 19]. For all three specific reference systems, the common proof concept is to ensure that under the coupling two key properties always hold with respect to the ordered server states as illustrated in Figure 1: (a) addition of a type I job at an arrival epoch in the original system must be accompanied by insertion of a job at a higher-ordered server in the reference system; (b) removal of a job at a service completion epoch in the reference system must force disposal of a type I job at the same ordered server in the original system (unless there is no type I job at that server). We can prove the following general lemma.

Lemma 3.1

(Affinity coupling).* If a stochastic coupling between the original system and the reference system can be constructed such that $(a)$ and $(b)$ are satisfied, then $(\overline{Q}_{i}^{\mathrm{aff}}(t))_{i\geq 1}$ is majorized by $(\overline{Q}_{i}^{\mathrm{ref}}(t))_{i\geq 1}$ for $t\geq 0$ . In the sense that*

[TABLE]

for all $m\geq 1$ , provided that the initial configurations of both systems satisfy this inequality.

The proof of Lemma 3.1 can be found in Subsection 5.1. In the remainder of this section, we will precisely describe the affinity coupling for each of the reference systems under consideration and verify that the properties (a) and (b) are satisfied. The coupling at service completion epochs to ensure property (b) as further detailed below is fairly standard and common across all three reference systems.

Coupling at arrival epochs. In contrast to the service completions, the coupling at arrival epochs to guarantee property (a) is novel and highly specific to the reference system under consideration. Due to the lack of exchangeability among servers, the coupling at arrival epochs involves a further subtle complication that does not arise in constructing sample path comparisons in the context of the ordinary supermarket model. Even though we compare the evolution of the two systems in terms of the $\overline{Q}_{i}$ variables as usual, these generally do not provide a Markovian state description for the original system as noted earlier in Section 2. In particular, the transitions at arrival epochs intricately depend on the server selections ${\mathcal{S}}$ , and cannot be suitably represented in terms of the $\overline{Q}_{i}$ variables.

Coupling at service completion epochs. The coupling generates potential service completions at rate $\mu_{1}N$ , but the aggregate service rate in either the original or the reference system might be lower than $\mu_{1}N$ because of servers being idle or only working at rate $\mu_{2}$ on type II jobs. Let $W_{\mathrm{aff}}$ and $W_{\mathrm{ref}}$ be the sets of ordered positions of servers in the original and reference system, respectively, that are working on (type I) jobs just before some time $t$ at which a potential service completion occurs. Define $W$ as the intersection $W_{\mathrm{aff}}\cap W_{\mathrm{ref}}$ which equals $W_{\mathrm{aff}}$ or $W_{\mathrm{ref}}$ due to the ordering and the preemptive strategy of the affinity-scheduling policy. A random variable $X_{t}$ , drawn from a uniform distribution on $[0,1]$ , decides which of the following actions is selected.

(i)

$0\leq X_{t}\leq\frac{|W|}{N}$ : Sample uniformly at random a position $n$ from $W$ ; a departure will take place at time $t$ in both the original and the reference system at the server located at position $n$ .

(ii)

$\frac{|W|}{N}<X_{t}\leq\frac{|W_{P}|}{N}$ where $P$ is ‘ $\mathrm{aff}$ ’ or ‘ $\mathrm{ref}$ ’: Sample uniformly at random a server position from $W_{P}\setminus W$ ; one job will be removed from the corresponding server in system $P$ at time $t$ .

(iii)

$X_{t}>\frac{\max\{|W_{\mathrm{aff}}|,|W_{\mathrm{ref}}|\}}{N}$ : No real departure will occur among the type I jobs in the original system or the jobs in the reference system.

We note that the total departure rate of type I jobs from the original system is indeed given by $\mu_{1}|W_{\mathrm{aff}}|$ , likewise for the reference system with a total departure rate of $\mu_{1}|W_{\mathrm{ref}}|$ . The idea to work with intersections of the active server sets comes from [16, Section 4].

3.1 Affinity coupling with the general model

We now consider a general structure for the server selections $\mathcal{S}$ and the corresponding arrival rates $\{\lambda_{S}\mid S\in\mathcal{S}\}$ per server selection. The reference system will operate under the RA policy with arrival rate $\lambda_{0}$ per server. Thus $\lambda_{0}<\mu_{1}$ is a sufficient stability condition for the reference system. So the purpose of this subsection is twofold: the affinity coupling is illustrated in a general setting of our affinity-scheduling policy in order to obtain a stochastic dominance result and a stability condition is obtained as an immediate by-product.

The choice of $\lambda_{0}$ is determined by the arrival rates per server selection in the original system, namely

[TABLE]

The variable $p_{Sn}$ may be interpreted as the fraction of jobs with server selection $S$ that are assigned to server $n\in S$ . With this interpretation in mind, it is easily seen that at least one server must handle an arrival rate of $\lambda_{0}$ or larger in case jobs are only allowed to be executed as type I jobs. Thus $\lambda_{0}<\mu_{1}$ is clearly a necessary stability condition for any strategy in this case. The condition is sufficient as well, for instance for a simple static strategy that assigns a job with server selection $S$ to server $n$ with probability $p_{Sn}$ . However, the implementation of this strategy would require full knowledge of the arrival rates $\lambda_{S}$ . We will establish that the condition is also sufficient for the stability of our affinity-scheduling strategy, which does not rely on any knowledge of the arrival rates $\lambda_{S}$ at all.

We now specify the affinity coupling for the reference system with the RA policy.

Coupling at arrival epochs. The coupling generates potential arrival events at rate $\lambda_{0}N$ . If a potential arrival occurs at time $t$ , a position $n^{*}$ from the set $\{1,\dots,N\}$ is selected uniformly at random. For brevity we simply refer to the server at position $n^{*}$ as server $n^{*}$ . An addition of a new job in the reference system will take place at this server $n^{*}$ . Since this position was randomly selected, the coupling strategy will give rise to an addition according to the RA policy in a system with arrival rate $\lambda_{0}$ per server.

In order to determine whether an arrival event of a type I job takes place in the original system and at which server this will happen, we follow the below-described strategy. Two random variables, $Y_{t,1}$ and $Y_{t,2}$ , are sampled from a uniform distribution on $[0,1]$ to take into account that the total arrival rate in the original system might be smaller than $\lambda_{0}N$ and to select a server selection $S$ for an arriving job. To make the decisions, we rely on the variables $(p_{Sn}^{*})_{S,n}$ that attain the minimum in (4). First, $Y_{t,1}$ establishes if an arrival occurs to a primary selection containing server $n^{*}$ , which happens with probability $\lambda_{n^{*}}^{*}/\lambda_{0}$ . If an arrival will take place, then a server selection $S$ containing $n^{*}$ is selected as the primary selection with probability $\lambda_{S}p_{Sn^{*}}^{*}/\lambda^{*}_{n^{*}}$ for which $Y_{t,2}$ is used. All remaining servers form the secondary selection. Note that the total arrival rate to a server selection $S$

[TABLE]

will indeed be equal to $\lambda_{S}$ in the original system as $\sum_{n\in S}p_{Sn}^{*}=1$ by definition such that this method to handle arriving jobs will coincide with the arrival process of general model described in Section 2. Once these selections are set for an arriving job, we apply the allocation policy as defined in Section 2. Due to the general structure of $\mathcal{S}$ it is not possible to determine the exact server at which a job is allocated in terms of the variables $(\overline{Q}^{N}_{ij})_{i,j}$ . However, if the new job is allocated as a type I job in the original system to one of the servers in $S$ , it is known that the position of this server will be at most $n^{*}$ . Since the newly arrived job in the reference system is assigned to server $n^{*}$ , property (a) of the coupling is maintained.

With the notation as introduced above, we can prove the following theorem.

Theorem 3.1

(General affinity-scheduling model).* Let $\lambda_{0}$ , as defined in $\mathrm{(\ref{eq:lam0})}$ , be the arrival rate per server in the reference system operating under the RA policy. Then, for suitable initial conditions,*

[TABLE]

holds for the general affinity-scheduling model with $N$ servers.

The above-described coupling between the original and the reference system operating under the RA policy satisfies the general framework of the affinity coupling, i.e. properties (a) and (b) hold. Then the result stated in Lemma 3.1 is applicable so that Theorem 3.1 follows from the majorization result established there. Theorem 3.1 provides a stochastic upper bound for the total number of type I jobs in the original system in terms of the number of jobs in a reference system with the RA policy by taking $m=1$ . Although this upper bound is sufficient to guarantee stochastic stability for $\lambda_{0}<\mu_{1}$ , we will develop stronger majorization results for particular settings of the graph model in the next two subsections. The method to prove the two stronger results is captured in the general framework stated at the beginning of this section, but requires a different and more advanced coupling method between arrival events.

3.2 Graph model

We will further investigate our model on a graph topology $G_{N}$ as described in Section 2. It is challenging to get a grip on the performance of an allocation policy that is applied in a network structure, and establishing stochastic dominance relations can give an initial insight into the theoretical behavior of load balancing algorithms in structured environments. It was mentioned in Section 2 that the arrival rate over all server selections established by the graph structure $G_{N}$ is given by $\lambda$ , and thus Theorem 3.1 is still valid if we set $\lambda_{0}=\lambda$ . However, we will make two different assumptions on the structure of the graph topology and for each of them a much stronger dominance result than Theorem 3.1 is obtained. The first scenario assumes that the minimum degree of $G_{N}$ is sufficiently high and the second scenario entails regular graph topologies.

3.2.1 Minimum degree

The reference system with $N$ exchangeable servers operates under an allocation policy related to JSQ, namely MJSQ( $k$ ) [15]. In this setting new jobs arrive at a total rate of $\lambda N$ and are processed at a server according to a FCFS policy at rate $\mu_{1}>\lambda$ . An arriving job is allocated to the server with the $(k+1)$ -th smallest queue length. A clear analogy can be seen if the system is initially completely empty; then $k$ servers will constantly remain idle. The system operates as if only $N{-}k$ servers are present and applies a JSQ policy restricted to these servers. If $N$ is sufficiently large compared to $k$ , i.e. if

[TABLE]

then the MJSQ( $k$ ) policy is stochastically stable. It is intuitively clear that this policy can achieve much better performance than the RA policy.

Suppose that the minimum degree of the graph $G_{N}$ is at least $N{-}k{-}1$ , without any other structural assumptions. Let $N$ and $k$ satisfy the relation in (7), then we can describe a coupling between our graph model with underlying topology $G_{N}$ and the reference system with the MJSQ( $k$ ) policy. The coupling between both systems will fit the general framework of the affinity coupling but the coupling method for the arriving jobs will differ from the general setting in the previous subsection.

Coupling at arrival epochs. For each of the neighborhood sets in $\mathcal{S}$ there is a uniform arrival rate $\lambda$ such that the total arrival rate in the original system is also given by $\lambda N$ . Assuming that an event in the coupled sample path is an arrival, it is always directed to the server at position $k+1$ under the MJSQ( $k$ ) policy. For the graph model, the primary selection $S$ consists of a randomly selected server and its neighbors under the topology $G_{N}$ and the secondary selection $S^{c}$ contains all other servers. We do not know the exact ordered positions of the servers in the primary selection that is of size at least $N-k$ in terms of the $\overline{Q}_{i}$ variables. The worst-case scenario that could arise is a primary selection of size exactly $N{-}k$ where the servers are the $N{-}k$ highest ordered servers. Then a type I job is allocated to the server at position $k+1$ . All other scenarios where an arriving job is labeled as a type I job in the original system will lead to an allocation that is at most at the $(k+1)$ -th position. Hence property (a) of the affinity coupling is satisfied.

Together with the coupling between service completions as explained in the first part of this section, the coupling between our affinity-scheduling policy on a graph structure with minimum degree $N{-}k{-}1$ and the reference system with the MJSQ( $k$ ) policy satisfies the general framework of the affinity coupling stated in Lemma 3.1. Thus Theorem 3.2 follows from the majorization result in Lemma 3.1.

Theorem 3.2

(Graph model with minimum degree $N{-}k{-}1$ ).* Consider the graph model with an underlying graph topology with minimum degree $N{-}k{-}1$ and a reference system that operates under the MJSQ( $k$ ) policy. Then, for suitable initial conditions,*

[TABLE]

Once the reference system is stochastically stable, if condition (7) is fulfilled, we can give a meaningful upper bound on the total number of type I jobs in the graph model in terms of the total number of jobs under the MJSQ( $k$ ) policy. This upper bound will be stronger compared to the result in Theorem 3.1 since the MJSQ( $k$ ) policy outperforms the RA policy with arrival rate $\lambda$ per server and service rate $\mu_{1}$ .

Remark 3.1

Theorem 3.2 can be generalized for scenarios where each server selection $S$ has a size of at least $N{-}k$ and a non-uniform arrival rate $\lambda_{S}$ .

3.2.2 Regular graph

As mentioned in the introduction, JSQ( $k$ ) gives already substantial performance improvements for small values of $k$ compared to the RA policy. With this in mind, we show that the number of type I jobs under our affinity-scheduling policy on a $d$ -regular graph is stochastically dominated by the total number of jobs under a JSQ( $k$ ) policy, when $d$ and $k$ satisfy the following relation:

[TABLE]

The proof requires a coupling between the arrival events in both systems such that feature (a) of the affinity coupling is maintained. We will introduce a novel approach to represent or visualize all possible server selections in $\mathcal{S}$ that an arriving job can choose from.

An arriving job in the reference system with JSQ( $k$ ) is allocated to the lowest positioned server among $k$ randomly selected servers. In total there are $\binom{N}{k}$ server selections and each server belongs to $\binom{N-1}{k-1}$ different server selections. Thus, the lowest positioned server of the system belongs to $\binom{N-1}{k-1}$ different server selections, the second lowest server is part of precisely $\binom{N-2}{k-1}$ different server selections without the lowest ordered server. One can continue this reasoning up to the $(N-k+1)$ -th lowest ordered server: this server belongs to only one more server selection that is not yet observed at any of the lower ordered servers. All higher ordered servers cannot be part of an unobserved server set. We will construct a step function from the positions $\{1,\dots,N\}$ to the interval $[0,1]$ in order to represent the server selections. Assume that the servers are ordered from 1 to $N$ and the lowest ordered position of $k$ selected servers is denoted by $n$ . We represent this selection as a block from position $n$ to $N$ with height $1/\binom{N}{k}$ . This procedure can be repeated for each of the $\binom{N}{k}$ possible selections. Stacking all these blocks according to their length will give rise to the following step function:

[TABLE]

An example of this visualization can be found in Figure 2.

We aim to construct a similar step function based on the possible primary server selections for the original system when the underlying graph topology is a $d$ -regular graph. Jobs arrive at a total rate $\lambda N$ and an arriving job selects uniformly at random a server selection $S$ from $\mathcal{S}$ . By construction, $\mathcal{S}$ contains $N$ different primary server selections, each of size $d+1$ . Then, the lowest ordered server in the system belongs to $d+1$ different server selections. However, it is not possible to count the number of additional server selections containing the second lowest ordered server without knowing the position of each of the servers, since we operate on a fixed graph structure. We construct a step function based on the worst-case scenario where the lowest positioned server of each of the server selections is still at the highest possible position. The first jump occurs at the lowest positioned server, while all remaining jumps will occur at the highest possible positioned servers. This induces stronger correlations between the servers that have more type I jobs. Notice that the worst-case ordering of the servers might not be a valid $d$ -regular structure, so that this approach might be too conservative. The step function of this worst-case scenario is given by

[TABLE]

An example of this step function can be found in Figure 2.

Coupling at arrival epochs. Let the total arrival rate be $\lambda N$ in both systems. For an arriving job at time $t$ we determine the servers of interest using the inverse transform sampling method [4, Chapter 2]. First, we note that the functions $f_{\mathrm{aff}}$ and $f_{\mathrm{ref}}$ are cumulative distribution functions by construction. Second, the only server of interest of the server selection $S$ in the original system or the server selection in the reference system is the lowest positioned server. So we sample a random variable $X_{t}$ from a uniform distribution on $[0,1]$ and determine the two servers positions, $n_{\mathrm{aff}}$ and $n_{\mathrm{ref}}$ , of interest of both systems. In the original system a server can be allocated as a type I job to the selected server or to any other server as a type II job. This procedure is visualized in Figure 2.

So in order to guarantee feature (a) of the affinity coupling, it needs to be ensured that $n_{\mathrm{aff}}\leq n_{\mathrm{ref}}$ . Feature (a) is guaranteed if the step function of the original system is above the step function of the reference system, i.e. $f_{\mathrm{aff}}(n)\geq f_{\mathrm{ref}}(n)$ for all positions $n$ . First we observe that the $d$ -regular graph must be rather dense in order to obtain a stronger upper bound than provided by the RA policy. This can be seen if we investigate the step function at position

[TABLE]

Once the degree $d$ is at least $N/2$ , it is straightforward to show that $f_{\mathrm{aff}}(x)\geq f_{\mathrm{ref}}(x)$ . This immediately implies that the step function $f_{\mathrm{aff}}$ only makes two jumps, at positions 1 and $N-d$ of sizes $(d+1)/N$ and $(N-d-1)/N$ , respectively. Since the step function $f_{\mathrm{ref}}$ is concave in its discrete points, we only need to ensure that $f_{\mathrm{aff}}(N-d-1)>f_{\mathrm{ref}}(N-d-1)$ holds so that the step function of the original system is above the step function of reference system. This results in condition (9) on the values of $d$ and $k$ . Due to the coupling construction, we can prove the following dominance result.

Theorem 3.3

(Graph model with $d$ -regular graph).* Consider the graph model with an underlying $d$ -regular graph topology and a reference system operating under a JSQ( $k$ ) policy. If the model parameters $d$ and $k$ satisfy condition $(\ref{eq:graph_mayo_reg})$ , then, for suitable initial conditions,*

[TABLE]

Due to the coupling construction using the block interpretation of the server selections, the above-described coupling fits the general framework of the affinity coupling as stated in Lemma 3.1. Therefore, the result of Theorem 3.3 follows from the result in Lemma 3.1. If $\lambda<\mu_{1}$ , the reference system is stochastically stable and provides a meaningful upper bound on the performance of the graph model on a $d$ -regular topology.

We list in Table 1 the minimum value of $d$ as a function of $k$ that guarantees the required dominance of the step functions for a system with $N=50$ servers. We observe that the graph structure is rather dense in order to stochastically dominate our process with a JSQ( $k$ ) policy even for small values of $k$ . One can argue that the primary selection of our affinity scheduling strategy must be much larger compared to the server selection under JSQ( $k$ ) in order to guarantee better performance. But we should keep in mind that the underlying graph structure is fixed and all possible server selections are predetermined while the JSQ( $k$ ) strategy can be seen as a strategy on a complete graph where an arbitrary set of size $k$ of the servers can be selected. This will affect the performance compared to a system with $N$ exchangeable servers, which is intuitively clear.

Moreover, it is important to note that the obtained value of $d$ might be too conservative. Our coupling method using the step functions requires a degree that is at least equal to $N/2$ in order to upper bound by the strategy JSQ(1), i.e. the RA policy. On the other hand, we showed in the general result that the number of type I jobs under any structural interpretation $\mathcal{S}$ is stochastically dominated by the number of jobs under a random assignment strategy.

Remark 3.2

(Combinatorial model).* Applying our affinity-scheduling strategy to the combinatorial model with $N$ servers shows a lot of similarities with the JSQ( $d$ ) policy in a setting of $N$ exchangeable servers. Namely, an arriving job is allocated to the server with the shortest queue length among $d$ arbitrarily selected servers and sometimes the job can be directed to an idle server outside this selection. The coupling can be adjusted such that the number of type I jobs at the $n$ -th ordered position under our affinity-scheduling policy is less than or equal to the number of jobs at the $n$ -th ordered position under a JSQ( $d$ ) policy. The relation between the two policies will become more apparent in Section 4 when fluid-limit results are investigated.*

Moreover, it can be shown that the combinatorial model is stochastically stable under a preemptive and a non-preemptive scheduling strategy using a Foster-Lyapunov argument. This result is shown under the assumption that the number of type II jobs at each server never exceeds one. When the initial condition already satisfies this feature, the affinity-scheduling policy will never add a second type II job to a server. The fact that stability is preserved under a preemptive strategy in favor of the type I jobs is no surprise due to the structure of the server selections $\mathcal{S}$ and the resemblance of the first step in the allocation strategy with the JSQ( $d$ ) policy. Under a non-preemptive strategy it is no longer intuitively clear, as any finite value of $\mu_{2}$ is allowed and one could imagine a situation where all servers are processing a type II job and type I jobs start to accumulate behind these type II jobs.

4 Fluid limit and fixed point analysis

As mentioned in the introduction, the affinity model in general lacks the exchangeability among the servers that underpins the use of mean field limits as the main analytical techniques in the supermarket model. Due to its inherent symmetry, the combinatorial model with uniform arrival rates for each of the server selections in $\mathcal{S}$ as described in Section 2 is one of the exceptions. The variables $(Q_{ij}^{N}(t))_{i,j}$ will give rise to a Markov process representation in this case. The primary and secondary server selections for an arriving job are of sizes $d_{1}$ and $d_{2}=N-d_{1}$ , respectively; we refer to both selections as the $d_{1}$ -selection and $d_{2}$ -selection. In order to gain insight in the system performance, we introduce the fluid scaled variables, i.e.

[TABLE]

and analyze a sequence of systems where the number of servers $N$ tends to infinity. The (weak) limit that arises is referred to as the fluid limit and is denoted by $(\overline{q}_{ij}(t))_{i,j}$ . When it is helpful to stress the proportion of servers with exactly $i$ type I jobs, instead of at least $i$ type I job, we consider the variables $(q_{ij}(t))_{i,j}$ . In this section we consider initial configurations of the system that give rise to a process with at most one type II job at each server through the complete process. as mentioned in Remark 3.2. Furthermore, we assume that $\lambda<\mu_{1}$ to guarantee stochastic stability. Throughout this section we will consider a system with $\lambda=0.8$ , $\mu_{1}=1$ , and $\mu_{2}=0.5$ in the numerical and simulation experiments, unless specified otherwise.

4.1 Fluid limit

We now provide a characterization of the (deterministic) fluid limit in terms of a set of discontinuous differential equations. The $t$ reference in the notation will be omitted, if the context allows this.

We introduce a reduced arrival rate $\tilde{\lambda}$ . A job will always be directed to an idle server if available, either as a type I job or a type II job and idle servers are generated at rate $\mu_{1}q_{10}-\mu_{2}q_{01}$ . This implies that if $\lambda$ is sufficiently high, i.e. $\lambda>\mu_{1}q_{10}+\mu_{2}q_{01}$ , only a fraction of the arriving jobs will start to queue in front of a server as type I jobs on fluid level. This fraction is given by $\tilde{\lambda}/\lambda$ with

[TABLE]

Then,

[TABLE]

with $\overline{q}_{00}+\overline{q}_{01}=1$ .

Since the system operates under a preemptive priority discipline, the structure of the departure rate in each of the equations in (15) is clear. For instance, in order to change the proportion $\overline{q}_{11}$ due to a job completion, this job completion must take place at a server with configuration $(1,1)$ . Exactly a fraction $q_{11}=\overline{q}_{11}-\overline{q}_{21}$ of the servers has this configuration and since these servers each work at rate $\mu_{1}$ the total rate of change is given by $\mu_{1}(\overline{q}_{21}-\overline{q}_{11})$ .

Let us illustrate the representation of the arrival term for the derivative of $\overline{q}_{11}$ . Only an arrival of a type I job at a server with configuration $(0,1)$ can contribute to the arrival term and the probability that this configuration is the smallest among the $d_{1}$ servers in the $d_{1}$ -selection is given by $\left(\overline{q}_{10}+\overline{q}_{01}\right)^{d_{1}}-\left(\overline{q}_{10}+\overline{q}_{11}\right)^{d_{1}}$ . Ties are broken according to the presence of a type II job, in favor of having no type II jobs. Moreover, there should be no idle servers because otherwise an arriving job would be allocated here as a type II job. Since type I jobs arrive at a reduced rate $\tilde{\lambda}$ , the total rate of change is given by $\tilde{\lambda}\mathds{1}\{q_{00}=0\}[\left(\overline{q}_{10}+\overline{q}_{01}\right)^{d_{1}}-\left(\overline{q}_{10}+\overline{q}_{11}\right)^{d_{1}}]$ .

The expressions for the arrival terms in the derivatives of (15) and the reduced arrival rate $\tilde{\lambda}$ should be considered more carefully due to the discontinuity at $q_{00}=0$ . We will give a sketch of the derivation of this fluid limit in Subsection 5.2. This derivation relies on the martingale method for point processes and Markovian queueing settings outlined by Pang et al. [17] and Brémaud [3].

The fluid-limit expression can be validated with simulations of the fluid-scaled stochastic process. Consider for instance Figure 3 where the solution of the fluid limit (15) is presented together with a simulated trajectory of a reasonably large system. It can be observed that the simulated trajectory fluctuates closely around the numerical solution of the fluid limit, which supports the connection between the fluid limit and the behavior of the stochastic system in a many-server setting.

4.2 Fixed points

To investigate the long-run behavior of the fluid limit (15), we are interested in its fixed points. It turns out that the mutual relationships between the model parameters $d_{1}$ , $\lambda$ , $\mu_{1}$ and $\mu_{2}$ play a crucial role. In the remainder of this section we investigate the setting where $\lambda>\mu_{2}$ , in order to compare one of the fixed points with the fixed point of a JSQ( $d_{1}$ ) policy with reduced load.

Theorem 4.1

(Fixed points).* When $\lambda>\mu_{2}$ and $d_{1}\geq 2$ , the system of differential equations $(\ref{eq:fluid_limit})$ always has the following fixed point:*

[TABLE]

Furthermore, when $d_{1}\geq d_{1}^{*}(\lambda,\mu_{1},\mu_{2})$ precisely two more fixed points exist. These fixed points are such that $q_{00}+q_{01}+q_{10}=1$ and $q_{00}>0$ . With $d_{1}^{*}\doteq d_{1}^{*}(\lambda,\mu_{1},\mu_{2})$ the minimum selection size that satisfies

[TABLE]

The proof of this theorem can be found in Subsection 5.2. It can be observed that there always exists a sufficiently large $d_{1}$ value that satisfies both inequalities of condition (17) for given values of $\lambda$ , $\mu_{1}$ and $\mu_{2}$ . This is trivial to see for the first inequality. The second inequality can be rewritten as

[TABLE]

with $a\doteq\lambda(\frac{1}{\mu_{2}}-\frac{1}{\mu_{1}})$ . The left hand side is increasing as function of $d_{1}$ with limit $1>\lambda/\mu_{1}$ . Table 2 gives the value of $d_{1}^{*}$ , so that the condition (17) is satisfied for given model parameters and all $d_{1}\geq d_{1}^{*}$ . It can be seen that the higher the load, the larger the size of the primary selections must be for multiple fixed points to persist. The additional fixed points have a strictly positive fraction of idle servers and it is intuitively clear that the number of servers where a job can be processed at rate $\mu_{1}$ must grow with the load in order for these fixed points with a strictly positive fraction of idle servers to persist.

Let $\tilde{\lambda}$ be as defined in (14) for the fixed point (16). Then the long-term fraction of servers with at least $i$ jobs under a JSQ( $d_{1}$ ) policy where each server works at rate $\mu_{1}$ is given by

[TABLE]

for $i\geq 0$ [14]. This shows a strong similarity with the fixed point (16) where two types of jobs are taken into account. Next we consider the case $d_{1}=1$ . When $\lambda>\mu_{2}$ there still is a unique fixed point with $q_{00}=0$ , given by:

[TABLE]

This shows strong resemblance with the RA policy with load $\rho=\tilde{\lambda}/\mu_{1}$ . Allowing a primary selection of at least two servers leads to a super-exponential improvement compared to a primary selection of size one. On the other hand, there is no fixed point with $\lambda\geq\mu_{2}$ and $q_{00}>0$ . Only if $\lambda<\mu_{2}$ we can show that there is a unique fixed point with $q_{00}>0$ , namely

[TABLE]

4.3 Further analysis

We will conduct a further analysis of the fluid limit (15) where we distinguish between $d_{1}$ sufficiently small and large compared to $d_{1}^{*}$ in the sense of conditions (17) in Theorem 4.1.

4.3.1 Sufficiently small primary selections

When $d_{1}$ is sufficiently small in terms of the model parameters $\lambda$ , $\mu_{1}$ and $\mu_{2}$ , i.e. $d_{1}<d_{1}^{*}$ , the fixed point (16) of the fluid limit (15) is unique. Numerical experiments suggest that this fixed point is a global attractor, i.e. the trajectories of the fluid limit will converge to this fixed point for every initial state of the system. As an example, we present Figure 4 where the numerical solution of the fluid limit is visualized for ten randomly sampled initial configurations. We consider a system with the above-mentioned model parameters and a primary selection of size $d_{1}=3$ . As can be seen from the figure, all cumulative fractions $\left(\overline{q}_{i0}\right)_{i\geq 0}$ tend to zero. Although some variability can be seen in the limiting behaviour of $(\overline{q}_{i1})_{i\geq 0}$ , the values are of the same order of magnitude as those of the theoretical fixed point. These deviations may be caused by numerical issues. There are two main reasons: (i) The fluid limit (15) is an infinite system of differential equations, so in order to solve the system numerically we need to truncate the system at some point. We choose to work with the variables up to $i=9$ . (ii) The fluid limit (15) contains the indicator function $\mathds{1}\{q_{00}=0\}$ , while we use $\mathds{1}\{q_{00}<10^{-15}\}$ . This plays a role for initial conditions where a large fraction of the servers has an empty type II queue, since the system still needs to make the transition to a state where every server has a type II job.

In the previous section we used the affinity coupling to show stochastic stability and the existence of an (unknown) stationary distribution for $\lambda<\mu_{1}$ . Assuming global stability of the unique fixed point, Theorem 1 by Benaïm and Le Boudec [2] ensures that the large- $N$ limit of the stationary distribution will converge to the fixed point. Moreover, from simulations it can be observed that the trajectories converge to the unique fixed point of the fluid limit (15). As an example consider a system with above-mentioned model parameters. Figure 5 shows a simulated trajectory of the fluid-scaled variables for a system with $N=2000$ servers that is initially completely empty. It can be seen that the trajectory converges to the fixed point $(q_{01},q_{11},q_{21},\dots)=(0.40,0.4704,0.1283,\dots)$ , rounded at four decimals.

The asymptotic approximation for the mean stationary queue length, excluding the job in service, suggested by the fixed point is given by

[TABLE]

Here CM( $d_{1}$ ) refers to the combinatorial model with a primary server selection of size $d_{1}$ . It is interesting to compare this with the asymptotic approximation for the mean queue length under a JSQ( $d_{1}$ ) policy in the ordinary supermarket model with arrival rate $\lambda$ and service rate $\mu_{1}$ [14, 22],

[TABLE]

and the exact mean queue length under the RA policy,

[TABLE]

Figure 6 presents a comparison of the number of waiting jobs as a function of $\lambda$ , with $d_{1}=3$ , $\mu_{1}=1$ and $\mu_{2}=0.5$ . It is known that the mean queue length for the RA policy tends to infinity when the offered traffic grows to one. We see that the mean queue length in the combinatorial model is slightly larger than for the JSQ( $d_{1}$ ) policy. On the other hand the variance of the queue length in the JSQ( $d_{1}$ ) model is almost twice as large compared to the combinatorial model. We conclude that the combinatorial model still performs well from a queue length perspective, even though each server has a type II job and possibly multiple type I jobs in its queue.

From the fixed point expression, it is not immediately visible that type II jobs finish their service since the fraction $q_{00}$ is zero. However, an idle server will be filled instantly with an arriving job. A total fraction

[TABLE]

of the arriving jobs undergo this ‘immediate switch’: they are allocated as a type II job to a server that just emptied its queue. This fraction decreases in $\lambda$ and for example for the above-mentioned model parameters, this leads to a fraction of $1/4$ . Furthermore, the type II jobs will leave the system at the same rate $\lambda-\tilde{\lambda}$ as they enter the system, since we study the system in equilibrium. Moreover, due to Little’s law we know that the expected waiting time of an arbitrary job is finite. Let $W$ denote the waiting time, then $\mathbb{E}\left[Q_{\textrm{CM($ d_{1} $)}}\right]=\lambda\mathbb{E}[W].$ Since the expected queue length under our affinity-scheduling policy is finite, this results in a finite expected waiting time for an arbitrary job, so also for the type II jobs.

Since each server operates under a preemptive scheduling policy, we can calculate the average waiting time of a type I job using Little’s law. Let $Q_{\mathrm{I}}$ denote the number of type I jobs at a server. Then

[TABLE]

Furthermore, the reduced arrival rate $\tilde{\lambda}$ gives the arrival rate of type I jobs on fluid level. If $W_{\mathrm{I}}$ represents the waiting time of a type I job, then due to Little’s law $\mathbb{E}[Q_{\mathrm{I}}]=\tilde{\lambda}\mathbb{E}[W_{\mathrm{I}}].$ Let $Q_{\mathrm{II}}$ and $W_{\mathrm{II}}$ have the same interpretation as above but for the type II jobs. We condition on the type of job to obtain

[TABLE]

Because of Little’s law this results in

[TABLE]

We can also immediately apply Little’s law to the type II jobs. We know that they arrive at rate $\lambda-\tilde{\lambda}$ and the mean waiting queue length is by definition given by

[TABLE]

In Figure 7 we compare the mean waiting time of a type I or type II job with the mean waiting time under the RA or the JSQ( $d_{1}$ ) policy. The mean waiting time of type II jobs is fairly high, but still lower than the waiting time under the RA policy. We also observe that the mean waiting time of type I jobs is significantly smaller than under a JSQ( $d_{1}$ ) policy. We conclude that our allocation strategy leads to a reduction in the mean waiting time for a large group of arriving jobs at the expense of some other jobs that encounter longer waiting times. The uniqueness of the fixed point allows us to analyze the asymptotic stationary distribution of the model, on the other hand we observe that the value of the size of the server selection $d_{1}$ is too small to achieve a zero waiting time for an arriving job.

4.3.2 Sufficiently large primary selections

Assume that the primary selection has a sufficiently large size $d_{1}$ for given model parameters in terms of the conditions (17), i.e. $d_{1}\geq d_{1}^{*}$ . From Theorem 4.1 we know that, next to the closed form fixed point (16), there are two additional fixed points with $q_{00}+q_{01}+q_{01}=1$ . We prove the following theorem using the indirect Lyapunov method.

Theorem 4.2

(Local (in)stability).* Of the two additional fixed points mentioned in Theorem 4.1 with $q_{00}+q_{01}+q_{01}=1$ when $d_{1}\geq d_{1}^{*}$ , one is locally stable and the other one is unstable.*

The proof of Theorem 4.2 is given in Subsection 5.2. In the remainder of this subsection, we will provide a numerical illustration, where we consider a system with $\lambda=0.8$ , $\mu_{1}=1$ , $\mu_{2}=0.5$ and $d_{1}=25$ throughout. We observed similar qualitative behavior across many different scenarios, but only present results for those parameter values because of space constraints. To get a better notion of the local stability we present Figure 8. For several initial values such that $q_{00}+q_{01}+q_{10}=1$ , the system of differential equations (15) is solved numerically. All trajectories with initial states indicated in blue will converge to the locally stable fixed point from the previous theorem and a few of these trajectories are also visualized. All other initial states, indicated in red, will not converge to this locally stable fixed point. We see that these states have a large fraction of servers with a type II job present and a small fraction of idle servers, since there is a smaller probability to select an idle server in the $d_{1}$ -selection. So jobs will have a longer mean service time as a type II job and jobs will start to accumulate.

In total this gives rise to two locally stable fixed points: the closed-form fixed point (16) where each server has a type II job and possibly multiple type I jobs and the fixed point from Theorem 4.2 where at most one job is present at each server. In the remainder of this section we will refer to these fixed points as the queueing fixed point and no-queueing fixed point, respectively. We do not formally prove this statement but we will illustrate it with a representative example. For a system with the above-mentioned parameters, the two fixed points under consideration (non-cumulative fractions) are given by:

[TABLE]

Both fixed points are indicated with dashed lines in Figure 9, in dark blue and dark green, respectively. Furthermore, the graphs contain 20 trajectories starting from randomly sampled initial configurations with $q_{00}+q_{01}+q_{10}+q_{11}=1$ , all these trajectories converge to one of the two fixed points. This implies that the convergence area presented in Figure 8 to the single-user fixed point will in fact be larger. As can be seen, most of the trajectories will converge to the type-II fixed point. This phenomenon will be even more apparent if we allow initial states with more than two jobs.

The literature often describes systems with a unique global attractor as a fixed point of the fluid limit so that there is a direct connection between the stationary distribution in a many-server setting and this fixed point. However, the non-uniqueness of the fixed points does not imply that these two concepts are completely uncorrelated. For instance, Figure 3 presents a comparison between the numerical solution of the fluid limit and a simulation with $N=2000$ servers with the above-mentioned model parameters. The system is initially empty and the simulated trajectory seems to converge to the no-queueing fixed point. We can presents a similar figure, where in the initial configuration each server has one type II job, in which case both the numerical solution and the simulation seem to tend to the queueing fixed point.

However, the stochastic process with a finite number of servers is an irreducible Markov process which implies that any state can be reached as long as the process is observed long enough and a unique equilibrium distribution must exist. Nevertheless, it can be observed that the process spends a long Nevertheless, it can be observed that the residence time near each of the locally stable fixed points, which increases with $N$ , is long before the process makes the transition to the other locally stable fixed point. Gibbens et al. [8] describe this concept of switching between multiple modes by ‘tunneling’. We can call these locally stable points the ‘quasi-stationary’ distributions of the stochastic process as in [25].

Examples of models with multiple local fixed points in loss and communication networks can be found in [1, 8, 25]. More recent work by Martirosyan and Robert [13] considers an allocation strategy closely related to the affinity-scheduling policy in a loss network setting, i.e. jobs can be redirected to distant servers with a penalty or can be omitted if none of the servers has enough spare capacity. Also in this setting, a fluid-limit analysis reveals multiple locally stable fixed points.

5 Proofs

5.1 Proof of Lemma 3.1: Affinity coupling

Since the system configurations between two consecutive events remain unchanged, we will condition on the discrete event times and use forward induction.

Assume that (3) holds up to the time of the $(k-1)$ -th event, we will argue that the majorization property still holds at time $t_{k}$ of the $k$ -th event by making a distinction between arrival and departure epochs. But first we need a formal way to express the effect of these events in terms of $(\overline{Q}_{i}^{\mathrm{aff}}(t))_{i\geq 1}$ and $(\overline{Q}_{i}^{\mathrm{ref}}(t))_{i\geq 1}$ . For instance, let $n$ be the server position selected for a departure. Due to the ordering we know that there are at least $N-n+1$ servers with the same number of jobs or more in their queues as the server at position $n$ . It might also be possible that the server at position $n-1$ has the same number of jobs as the server at position $n$ , there is notable difference in in terms of the variables $\overline{Q}_{i}$ whether a removal takes place at position $n-1$ or at position $n$ . Instead of removing from the server at position $n$ and reordering the servers before computing $(\overline{Q}_{i}^{\mathrm{aff}}(t))_{i\geq 1}$ and $(\overline{Q}_{i}^{\mathrm{ref}}(t))_{i\geq 1}$ , we can also immediately compute these quantities. The difference is subtle and valid because the proof does not rely on the present type II jobs or on the actual servers but only on their relative positions. Therefore we define two intermediate quantities:

[TABLE]

For instance in the original system in Figure 1, $I_{\mathrm{aff}}(n)$ is given by 3. Furthermore, only one job will be added or removed at a discrete time event. A new event at time $t_{k}$ could only violate (3) if at time $t_{k-1}$ (3) holds with equality, i.e.

[TABLE]

with $m\geq 1$ . Therefore we only focus on this setting in the induction step.

Arrival. At time $t_{k}$ an arrival occurs and first position $n$ is selected, the updated reference system looks as follows:

[TABLE]

If the newly arrived job is allocated as a type II job in the original system or no arrival takes place due to the coupling, (3) is trivially satisfied. We consider the setting where the job is allocated as a type I job to a server at position $n_{\mathrm{aff}}$ which is at most $n$ , such that

[TABLE]

Moreover, the left hand side of (3) remains unchanged if $m>I_{\mathrm{aff}}(n_{\mathrm{aff}})+1$ so that the order in (3) is preserved. Now, fix $m\leq I_{\mathrm{aff}}(n_{\mathrm{aff}})+1$ , if we now show that also $I_{\mathrm{ref}}(n)\geq m-1$ , then (3) remains valid since both sides are raised by one. We use (32) and the induction hypothesis for $m-1$ at time $t_{k}^{-}$ to obtain

[TABLE]

Then it follows that $I_{\mathrm{aff}}(n_{\mathrm{aff}})\geq m-1$ implies $I_{\mathrm{ref}}(n)\geq m-1$ which concludes the derivation if the event at time $t_{k}$ is an arrival.

Departure. If at time $t_{k}$ a departure will take place, one of the following four scenarios will occur.

There is a job completion of a type I job in the original system and of a job in the reference system. 2. 2.

There is only a departure at the jobs of the reference system. 3. 3.

There is only a departure of a type I job in the original system. 4. 4.

There is no departure at the type I jobs of the original system or the jobs of the reference system.

It is clear that we only need to investigate the first two scenarios.

Scenario 1. Let $n\in W$ be the position of the servers in both the original and the reference system from which a job will be removed. The updated systems will look as follows,

[TABLE]

We will focus on $m\leq I_{\mathrm{ref}}(n)$ , since for $m>I_{\mathrm{ref}}(n)$ (3) remains trivially valid. A similar argument as above will be used to show that $I_{\mathrm{aff}}(n)\geq m$ , so that both sides will be lowered by one compared to the event time $t_{k-1}$ . We use (32) and the induction hypothesis for $m+1$ at time $t_{k}^{-}$ to obtain $\overline{Q}_{m}^{\mathrm{aff}}(t_{k}^{-})\geq\overline{Q}_{m}^{\mathrm{ref}}(t_{k}^{-})$ . Then it follows that $I_{\mathrm{ref}}(n)\geq m$ implies $I_{\mathrm{aff}}(n)\geq m$ which concludes the proof of scenario 1.

Scenario 2. Let $n\in W_{\mathrm{ref}}\setminus W$ be the position where a job leaves the reference system, then for all $j$

[TABLE]

Again we focus on $m\leq I_{\mathrm{ref}}(n)$ . Fix $m$ , we will show by contradiction that (32) cannot occur so that (3) is preserved at time $t_{k}$ since the right hand side can be lowered by at most one. Assuming that (32) does hold and using the induction hypothesis on $m+1$ , we conclude that $\overline{Q}_{m}^{\mathrm{aff}}(t_{k}^{-})\geq\overline{Q}_{m}^{\mathrm{ref}}(t^{-}_{k})$ . Now,

[TABLE]

since $N-|W_{\mathrm{ref}}|<n\leq N-|W|$ . This implies that $\overline{Q}_{m}^{\mathrm{aff}}(t_{k}^{-})>|W|$ , however there are only $|W|=|W_{\mathrm{aff}}|$ servers working on a type I job in the original system. This leads to a contradiction and concludes the proof of Lemma 3.1.

5.2 Proofs: Fluid limit and fixed point analysis

5.2.1 Derivation fluid limit $(\ref{eq:fluid_limit})$

First, consider the stochastic process with $N$ servers and its corresponding flow conservation equations. Next, the martingale methods as outlined by Pang et al. [17] and Brémaud [3] are applied and the limit of $N$ to infinity of the fluid scaled process is studied. Then (15) is obtained from the resulting system of integral equations.

Step 1: flow conservation equations. Let $p_{ij}^{N}(q,t)$ be the probability that an arriving job at time $t$ is allocated to a server with $i$ type I jobs and $j$ type II jobs as a type $q$ job, with $q\in\{\mathrm{I},\mathrm{II}\}$ . As before, we will omit the time dependence $t$ to ease the notation.

Only to an idle server we can allocate a job as a type I or type II job; allocations to servers with a higher configuration will always take place as a type I job. The corresponding transition probabilities are given by

[TABLE]

the probability that an idle server is present in the $d_{1}$ -selection, and

[TABLE]

the probability that the $d_{1}$ -selection does not contain an idle server while they are present. As mentioned in the model description, the $d_{2}$ -selection contains all servers that are not in the $d_{1}$ -selection. Hence the indicator function $\mathds{1}\{Q_{00}^{N}>0\}$ emerges in the probabilities.

An arriving job will be allocated as a type I job to a server with configuration $(i,0)$ , with $i\geq 1$ , if the minimum configuration in the $d_{1}$ -selection is given by $(i,0)$ and when there are no completely idle servers that can be included in the $d_{2}$ -selection. The corresponding probability is given by the probability that the $d_{1}$ -selection contains only servers with at least $i$ type $I$ jobs minus the probability that all $d_{1}$ servers have a configuration strictly higher than $(i,0)$ . Thus, for $i\geq 1$ ,

[TABLE]

In a similar way, we obtain $p_{i1}^{N}(I)$ , for $i\geq 0$ :

[TABLE]

Once these probabilities are set, the flow conservation equations can be constructed. The randomness in the stochastic model is caused by Poisson arrivals and exponentially distributed service times, so that the number of arrivals and service completions can be counted using Poisson processes with appropriately chosen rates. Define a set of independent Poisson processes with rate 1. Let $P_{A_{00,q}}$ denote the Poisson counting process for the number of arriving type $q$ jobs at servers with configuration $(0,0)$ , and $P_{A_{ij}}~{}i+j\geq 1$ reflects the arriving jobs at servers with configuration $(i,j)$ . Similarly, define the counting process of the service completions $P_{S_{ij}},~{}i+j\geq 1$ . Furthermore if $i\geq 1$ , the number of servers at time $t$ with at least $i$ type I jobs and exactly $j$ type II jobs depends on its initial state $(\overline{Q}^{N}_{ij}(0))$ , the number of service completions of jobs at servers with configuration $(i,j)$ and the number of arrivals at servers in configuration $(i-1,j)$ within the time interval $[0,t)$ . We obtain the following flow conservation equations for the stochastic model $(\overline{Q}^{N}_{ij})_{i,j}$ with $N$ servers and total arrival rate $\lambda N$ . Let $i\geq 2$ :

[TABLE]

Due to the Poisson split property we define $P_{A_{00}}$ as the sum of the two processes $P_{A_{00,\mathrm{I}}}$ and $P_{A_{00,\mathrm{II}}}$ .

Step 2: Fluid scaled process. Dividing both sides of the equations by $N$ results in a fluid scaled process. Further, because of the martingale results in [3] and [17] we can define noise terms $e_{ij}(N)$ that tend to 0 as $N\rightarrow\infty$ with $i\geq 0$ and $j\in\{0,1\}$ . The fluid scaled system can be rewritten as follows, with $i\geq 2$ ,

[TABLE]

Step 3: Towards fluid limits. While making the transition from integral equations to differential equations with $N$ tending to infinity, the representation of the departure terms in (15) is straightforward. The arrival terms in the differential equations, on the other hand, are not immediately obvious.

To illustrate the difficulty, assume there are among the $N$ servers only a small number of idle servers. As the allocation strategy describes, one of these servers will be selected by an arriving job. If the number of idle servers is small and the arrival rate is sufficiently high, rapid switches will occur in the indicator function $\mathds{1}\{Q^{N}_{00}=0\}$ . A server that becomes idle due to a service completion will immediately be selected again by the arriving job. However, the fraction of empty servers ( $Q^{N}_{00}/N$ ) will be more robust against these changes due to the fluid scaling.

In general, this phenomenon is called ‘separation of time scales’ as described by Hunt and Kurtz [10]. One observes the interaction of two processes. One process evolves very fast, namely the number of empty servers, while the second process evolves much slower, the occupancy fractions in this setting. In order to obtain the arrival terms of the fluid limit, we should be able to combine these processes. Focusing on the first arrival integral in (44), the question arises how to handle the expression

[TABLE]

A similar problem is analyzed in [10] where one needs to take the limit of a integral of an indicator function. The existence of a measure $\alpha$ is deduced such that

[TABLE]

The existence of this function $\alpha$ , which does not need to be continuous, can be justified by the following reasoning. In a small time interval, say $[0,\delta t]$ , the number of idle servers is a heavily fluctuating process, though the process describing the occupancy proportions is approximately constant. During this small interval, the number of idle servers can be considered as a birth-and-death process with ‘death’ rate $\lambda$ , since an arriving job causes a reduction in the number of idle servers. The ‘birth’ rate is determined by the occupancy proportions, i.e. the proportion of servers that are working on type I or type II jobs. Then it is argued in [10] that

[TABLE]

after application of the ergodic theorem, converges to an invariant measure if $N$ tends to infinity. This invariant measure will give rise to the function $\alpha$ . One already senses that the presence or absence of idle servers should be handled as two different cases. Therefore we make a distinction between $q_{00}$ strictly positive or equal to zero in the intuitive explanation of the structure of the fluid limit.

The case $q_{00}>0$ . When the number of idle servers is sufficiently large, each arriving job will be allocated to an idle server for sure. A fraction

[TABLE]

of the arriving jobs will be allocated as type II jobs which causes the changes in (15) for $\overline{q}_{00}$ , $\overline{q}_{01}$ and $\overline{q}_{10}$ .

The case $q_{00}=0$ . Idle servers are generated at rate $\mu_{1}q_{10}+\mu_{2}q_{01}$ . Since $d_{1}$ is finite, the probability that the $d_{1}$ -selection would contain an idle server is negligible, each idle server will be provided with a type II job when the arrival rate is high enough. If $\tilde{\lambda}=(\lambda-\mu_{1}q_{10}+\mu_{2}q_{01})^{+}$ is strictly larger than zero, a fraction

[TABLE]

of the stream of incoming jobs will immediately be redirected to the idle servers as a type II job. The excess stream of incoming jobs (fraction $\tilde{\lambda}/\lambda$ ) will not observe any idle server and will start to form (type I) queues in front of the servers of the $d_{1}$ -selection according to a straightforward generalization of the transition probabilities mentioned in step 1.

This concludes the derivation of the fluid limit (15).

5.2.2 Proof of Theorem 4.1: fixed points

We will start with the proof of the closed-form fixed point and show that this is the only fixed point without idle servers on fluid level, i.e. $q_{00}=0$ . Next, we will consider fixed points with $q_{00}>0$ .

Fixed points with $q_{00}=0$ . The correctness of the expression in (16) can easily be confirmed by substitution into (15). The result can be established in two steps. First, we observe that the derivatives of $(\overline{q}_{i0})_{i}$ in (15) remain zero once $(\overline{q}_{i0}^{*})_{i}$ equals zero. Then, we substitute $(\overline{q}_{i0}^{*})_{i}=0$ into the derivatives of $(\overline{q}_{i1})_{i}$ . For $i\geq 1$ we obtain:

[TABLE]

These equations can be solved and one obtains the fixed point as given in (16). Note the similarity between (50) and the fluid limit of a JSQ( $d_{1}$ ) policy with reduced arrival rate

[TABLE]

in a setting where each of the exchangeable servers works at rate $\mu_{1}$ [14].

Second, this fixed point is unique under the condition that $q_{00}$ equals zero. From Lemma 2 in [14] we know that the fixed point of the fluid limit in the JSQ( $d_{1}$ ) setting is unique when $d_{1}\geq 2$ . This implies that under the condition that all servers have a type II job, i.e. $\overline{q}_{i0}^{*}=0$ for all $i$ , uniqueness is guaranteed. Assume by contradiction that another fixed point exists without idle servers but with possibly a positive cumulative fraction $\overline{q}_{i0}^{*}$ for some $i$ . We focus on the differential equations of $(\overline{q}_{i0})_{i\geq 1}$ under this fixed point. From

[TABLE]

we get that $\overline{q}_{10}^{*}=\overline{q}_{20}^{*}$ . Repeating this procedure for $i=2$ ,

[TABLE]

results in $\overline{q}_{20}^{*}=\overline{q}_{30}^{*}$ . By induction we could show that $\overline{q}_{i0}^{*}=\overline{q}_{i+1,0}^{*}$ for $i\geq 1$ , this leads to $\overline{q}_{i0}^{*}=0$ for $i\geq 1$ . This proves the uniqueness of the fixed point when $q_{00}$ equals zero.

Fixed points with $q_{00}>0$ . Under this setting, the fluid-limit equations (15) simplify significantly.

[TABLE]

For any fixed point it should hold that

[TABLE]

for $i\geq 2$ , then it follows that $(\overline{q}_{i0}^{*})_{i\geq 2}=0$ and $(\overline{q}_{i1}^{*})_{i\geq 1}=0$ . This implies that the only positive fractions are $q_{00}$ , $q_{01}$ and $q_{10}$ . Rewriting the fluid limit in a non-cumulative expression gives us:

[TABLE]

From the second and third equality it is clear that once $q_{00}^{*}$ is known, we know the entire fixed point:

[TABLE]

The system in (56) is linearly dependent. We use the fact that $q_{00}$ , $q_{01}$ and $q_{10}$ must sum up to one to determine $q_{00}$ . It must hold:

[TABLE]

Define $x\doteq 1-q_{00}$ . We are interested in the zero points of the polynomial $f$ within $[0,1)$ with

[TABLE]

We will evaluate the existence of the fixed points based on the behaviour of $f$ and its derivative,

[TABLE]

Furthermore,

[TABLE]

and $f^{\prime}$ is monotone increasing on $(0,1)$ with

[TABLE]

Since $f$ is positive in both its endpoints and the derivative $f^{\prime}$ is monotone increasing, we need at least a vanishing derivative in $(0,1)$ in order to have a fixed point. This is guaranteed when $f^{\prime}(1)>0$ , this is the first condition from (17). We now know that $f$ attains a local minimum at

[TABLE]

and is strictly positive in its endpoints. If $f(\tilde{x})$ is exactly zero, we have one fixed point, namely $q_{00}^{*}=1-\tilde{x}$ . But only in very special cases the second condition of (17) is satisfied with equality for a random choice of $d_{1}$ , $\lambda$ , $\mu_{1}$ and $\mu_{2}$ . On the other hand, if $f(\tilde{x})<0$ , i.e. if also

[TABLE]

holds, we have exactly two fixed points such that $q_{00}+q_{01}+q_{10}=1$ . There is one fixed point situated at each side of $\tilde{x}$ in the interval $(0,1)$ . This gives that for $d_{1}$ large enough we can find two solutions of the reduced system of differential equations. It can be shown by contradiction that both fixed points are larger than $\lambda/\mu_{1}$ , so the corresponding fractions of idle servers is smaller than $1-\lambda/\mu_{1}$ .

For completeness we mention that $\lambda=\mu_{2}$ would imply that $f(1)=0$ and so the proportion of empty servers is zero which violates the assumption that $q_{00}>0$ . Moreover, if $\lambda<\mu_{2}$ , then the polynomial $f$ vanishes in the interval $(0,1)$ . The monotone increasing property of the derivative of $f$ leads to the fact that there exists a unique fixed point $x^{*}$ in $(0,1)$ . This results in a unique fixed point $(q_{00}^{*},q_{01}^{*},q_{10}^{*})$ with $q_{00}^{*}>0$ .

This concludes the proof of Theorem 4.1.

5.2.3 Proof of Theorem 4.2: local (in)stability

We will prove local (in)stability using the indirect Lyapunov method based on the Hartman-Grobman Theorem [9]. This theorem states that a system of differential equations behaves near its fixed points as its linearized version. The eigenvalues of the linearized system will define the local behavior of the system unless one of the eigenvalues has a real part equal to zero, then the Hartman-Grobman theorem is inconclusive. When we would immediately apply this theorem to one of the two fixed points of (56) we obtain an eigenvalue exactly equal to zero, but one can resolve this issue since (56) is a redundant system. Since $q_{00}+q_{01}+q_{10}=1$ , it is sufficient to know the instantaneous change of two variables. Each elimination will lead to the same two eigenvalues so we can remove for instance the third equation from (56):

[TABLE]

Let $(q_{00}^{*},q_{01}^{*},q_{10}^{*})$ denote a fixed point, then the matrix of the linearized system looks as follows near its fixed point:

[TABLE]

The corresponding eigenvalues are given by

[TABLE]

Since $\mu_{1}>\mu_{2}$ the quantity under the root is always positive, so the square root is real. This implies furthermore that $\alpha_{-}<0$ . To determine the sign of $\alpha_{+}$ we need to make a distinction between the two fixed points. From the proof of Theorem 4.1, we know that the two fixed points are on both sides of $\tilde{x}$ , with $\tilde{x}$ as in (62). For

[TABLE]

we have that

[TABLE]

This shows that the fixed point with the smallest proportion of idle servers is unstable.

When $1-q_{00}^{*}<\tilde{x}$ , it follows in a similar way that $2\alpha_{+}<0.$ This shows that the fixed point with the largest proportion of idle servers is locally stable. This concludes the proof of Theorem 4.2.

6 Conclusion and outlook

We investigated load balancing issues in a service system where particular servers are better equipped to process certain jobs due to affinity or compatibility relations. The general model in particular covers the setting with an underlying network topology $G_{N}$ , referred to as the graph model. The analysis of the graph model is severely complicated by the lack of exchangeability among the servers, a feature linked to the supermarket modeling framework that allows mean-field techniques. We constructed the a novel affinity coupling to obtain a stochastic performance bounds for the general model and more specific settings, for instance model instances where the underlying graph topology $G_{N}$ has a specific minimum degree or is a $d$ -regular graph.

Another instance of the general model, the combinatorial model, has enough inherent symmetry to conduct a fluid-limit analysis. The fluid limit was stated in terms of a set of discontinuous differential equations and its fixed point sensitively depends on the size $d$ of the primary selection. When $d$ is sufficiently small, a unique fixed point exist but the associated waiting time does not vanish. When the primary selection is sufficiently large, a fixed point arises that does provide a zero waiting time. On the other hand, the above-mentioned fixed point still persists, giving rise to bistability issues.

As mentioned above, the stochastic upper bounds for the graph model in terms of a supermarket model with a JSQ( $d$ ) policy require the degrees in the underlying graph to be relatively high compared to $d$ . To some extent, this reflects that the performance may be poor in certain pathological cases even when the node degrees are fairly high. An interesting topic for further research would be to extend the affinity coupling and possibly identify relevant structural conditions on the graph topology in order to sharpen these bounds.

Recall that a supermarket model with a JSQ( $d$ ) policy is equivalent to the combinatorial model with server selections of size $d$ and identical arrival rates when jobs could not be allocated as a type II job. A natural conjecture is that the latter combinatorial model is the best-case scenario given a maximum cardinality $d$ of the server selections. This would imply that the supermarket model with a JSQ( $d$ ) policy provides stochastic lower bounds in some appropriate sense for any affinity model with server selections of size at most $d$ .

The bistability of the fluid limit of the combinatorial model for large values of $d_{1}$ not only precludes any convergence statements for the stationary distribution, but also suggests that the allocation strategy could possibly be refined. In future work we intend to examine such refinements and establish that these eliminate the queueing fixed point and render the no-queueing fixed point globally stable.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. Bean, R. Gibbens, and S. Zachary. Dynamic and equilibrium behavior of controlled loss networks. The Annals of Applied Probability , pages 873–885, 1997.
2[2] M. Benaïm and J.-Y. Le Boudec. On mean field convergence and stationary regime. ar Xiv preprint ar Xiv:1111.5710 , 2011.
3[3] P. Brémaud. Point Processes and Queues, Martingale Dynamics . Springer-Verlag, New York, 1981.
4[4] L. Devroye. Non-Uniform Random Variate Generation . Springer-Verlag, 1986.
5[5] A. Ephremides, P. Varaiya, and J. Walrand. A simple dynamic routing problem. IEEE Transactions on Automatic Control , 25(4):690–693, 1980.
6[6] D. Gamarnik, J. N. Tsitsiklis, and M. Zubeldia. Delay, memory, and messaging tradeoffs in distributed service systems. ACM SIGMETRICS Performance Evaluation Review , 44(1):1–12, 2016.
7[7] N. Gast. The power of two choices on graphs: the pair-approximation is accurate? ACM SIGMETRICS Performance Evaluation Review , 43(2):69–71, 2015.
8[8] R. Gibbens, P. Hunt, and F. Kelly. Bistability in communication networks. Disorder in physical systems , pages 113–128, 1990.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Job Allocation in Large-Scale Service Systems with Affinity Relations

Abstract

1 Introduction

2 Model description

Remark 2.1

3 Stochastic dominance and coupling

Lemma 3.1

3.1 Affinity coupling with the general model

Theorem 3.1

3.2 Graph model

3.2.1 Minimum degree

Theorem 3.2

Remark 3.1

3.2.2 Regular graph

Theorem 3.3

Remark 3.2

4 Fluid limit and fixed point analysis

4.1 Fluid limit

4.2 Fixed points

Theorem 4.1

4.3 Further analysis

4.3.1 Sufficiently small primary selections

4.3.2 Sufficiently large primary selections

Theorem 4.2

5 Proofs

5.1 Proof of Lemma 3.1: Affinity coupling

5.2 Proofs: Fluid limit and fixed point analysis

5.2.1 Derivation fluid limit (\refeq:fluidlimit)(\ref{eq:fluid_limit})(\refeq:fluidl​imit)

5.2.2 Proof of Theorem 4.1: fixed points

5.2.3 Proof of Theorem 4.2: local (in)stability

6 Conclusion and outlook

5.2.1 Derivation fluid limit $(\ref{eq:fluid_limit})$