Hyper-Scalable JSQ with Sparse Feedback
Mark van der Boor, Sem Borst, Johan van Leeuwaarden

TL;DR
This paper introduces a new load balancing scheme for large-scale systems that uses sparse server feedback to significantly reduce communication overhead while maintaining near-optimal performance, outperforming existing methods in low-feedback regimes.
Contribution
The paper proposes a novel load balancing scheme with sparse feedback that outperforms JSQ(d) and sparsified JIQ strategies, achieving vanishing wait times with minimal communication.
Findings
Outperforms JSQ(d) with similar communication overhead.
Achieves vanishing waiting time with just one message per job.
In ultra-low feedback regimes, waiting time remains bounded in synchronous updates but diverges asynchronously.
Abstract
Load balancing algorithms play a vital role in enhancing performance in data centers and cloud networks. Due to the massive size of these systems, scalability challenges, and especially the communication overhead associated with load balancing mechanisms, have emerged as major concerns. Motivated by these issues, we introduce and analyze a novel class of load balancing schemes where the various servers provide occasional queue updates to guide the load assignment. We show that the proposed schemes strongly outperform JSQ() strategies with comparable communication overhead per job, and can achieve a vanishing waiting time in the many-server limit with just one message per job, just like the popular JIQ scheme. The proposed schemes are particularly geared however towards the sparse feedback regime with less than one message per job, where they outperform corresponding sparsified JIQā¦
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Queuing Theory Analysis Ā· Cloud Computing and Resource Management Ā· Distributed and Parallel Computing Systems
Hyper-Scalable JSQ with Sparse Feedback
Mark van der Boor
Eindhoven University of TechnologyP.O. Box 513Eindhoven5600 MBThe Netherlands
,Ā
Sem Borst
Eindhoven University of TechnologyThe Netherlands
Nokia Bell LabsP.O. Box 636Murray HillNJ07974USA
Ā andĀ
Johan van Leeuwaarden
Eindhoven University of TechnologyThe Netherlands
Abstract.
Load balancing algorithms play a vital role in enhancing performance in data centers and cloud networks. Due to the massive size of these systems, scalability challenges, and especially the communication overhead associated with load balancing mechanisms, have emerged as major concerns. Motivated by these issues, we introduce and analyze a novel class of load balancing schemes where the various servers provide occasional queue updates to guide the load assignment.
We show that the proposed schemes strongly outperform JSQ() strategies with comparable communication overhead per job, and can achieve a vanishing waiting time in the many-server limit with just one message per job, just like the popular JIQ scheme. The proposed schemes are particularly geared however towards the sparse feedback regime with less than one message per job, where they outperform corresponding sparsified JIQ versions.
We investigate fluid limits for synchronous updates as well as asynchronous exponential update intervals. The fixed point of the fluid limit is identified in the latter case, and used to derive the queue length distribution. We also demonstrate that in the ultra-low feedback regime the mean stationary waiting time tends to a constant in the synchronous case, but grows without bound in the asynchronous case.
load balancing, scaling limits, data centers, cloud networks, parallel-server systems, join-the-shortest-queue, delay performance
ā ā copyright: acmlicensedā ā journal: POMACSā ā journalyear: 2019ā ā journalvolume: 3ā ā journalnumber: 1ā ā article: 4ā ā publicationmonth: 3ā ā price: 15.00ā ā doi: 10.1145/3311075ā ā ccs: Mathematics of computingĀ Queueing theoryā ā ccs: Mathematics of computingĀ Stochastic processes
1. Introduction
Background and motivation. We introduce and analyze hyper-scalable load balancing algorithms that only involve minimal communication overhead and yet deliver excellent performance. Load balancing algorithms play a key role in efficiently distributing jobs (e.g.Ā compute tasks, database look-ups, file transfers) among servers in cloud networks and data centers (Gandhi etĀ al., 2014; Maguluri etĀ al., 2012; Patel etĀ al., 2013). Well-designed load balancing schemes provide an effective mechanism for improving performance metrics in terms of response times while achieving high resource utilization levels. Besides these typical performance criteria, communication overhead and implementation complexity have emerged as equally crucial attributes, due to the immense size of cloud networks and data centers. These scalability challenges have fueled a strong interest in the design of load balancing algorithms that provide robust performance while only requiring low overhead.
We focus on a basic scenario of Ā parallel identical servers, exponentially distributed service requirements, and a service discipline at each server that is oblivious to the actual service requirements (e.g.Ā FCFS). In this canonical case, the Join-the-Shortest-Queue (JSQ) policy has strong stochastic optimality properties, and in particular minimizes the overall mean delay among the class of non-anticipating load balancing policies that do not have any advance knowledge of the service requirements (Ephremides etĀ al., 1980; Winston, 1977).
In order to implement the JSQ policy, a dispatcher requires instantaneous knowledge of the queue lengths at all the servers, which may give rise to a substantial communication burden, and not be scalable in scenarios with large numbers of servers. The latter issue has motivated consideration of so-called JSQ() strategies, where the dispatcher assigns incoming jobs to a server with the shortest queue among Ā servers selected uniformly at random. This involves an exchange of messages per job (assuming ), and thus greatly reduces the communication overhead compared to the JSQ policy when the number of serversĀ is large. At the same time, results in MitzenmacherĀ (Mitzenmacher, 2001) and Vvedenskaya et al.Ā (Vvedenskaya etĀ al., 1996) indicate that even a value as small as yields significant performance improvements as compared to a purely random assignment scheme (). This is commonly referred to as the āpower-of-twoā effect.
Although JSQ() strategies provide notably better waiting-time performance than purely random assignment, they lack the ability of the conventional JSQ policy to drive the waiting time to zero in the many-server limit. Moreover, while JSQ() strategies notably reduce the amount of communication overhead compared to the full JSQ policy, the two-way delay incurred in obtaining queue length information still directly adds to the waiting time of each job. The latter achilles heel of āpush-basedā strategies is eliminated in āpull-basedā strategies where servers pro-actively provide queue length information to the dispatcher. A particularly popular pull-based strategy is the so-called Join-the-Idle-Queue (JIQ) scheme (Badonnel and Burgess, 2008; Lu etĀ al., 2011). Servers advertise their availability to the dispatcher whenever they become idle, which involves no more than one message per job to send a job to an available idle server. This pull-based strategy has the ability of the full JSQ policy to achieve a zero waiting time in the many-server limitĀ (Stolyar, 2015). A pull-based implementation for the JSQ policy exists but it leads to more frequent communication requirements or larger communication messages.
The superiority of the JIQ scheme over JSQ() strategies in terms of performance and communication overhead is owed to the state information stored at the dispatcher. Results inĀ (Gamarnik etĀ al., 2016) imply that a vanishing waiting time can only be achieved with finite communication overhead per job when allowing memory usage at the dispatcher, and further suggest that one message per job is a minimal requirement in a certain sense. However, even just one message per job may still be prohibitive, especially when jobs do not involve big computational tasks, but small data packets which require little processing, e.g.Ā in IoT cloud environments. In such situations the sheer message exchange in providing queue length information may be disproportionate to the actual amount of processing required.
Hyper-scalable algorithms. Motivated by the above issues, we propose and examine a novel class of load balancing schemes which also leverage memory at the dispatcher, but allow the communication overhead to be seamlessly adapted and reduced below that of the JIQ scheme. The basic scheme is as follows:
Algorithm 1 (Basic hyper-scalable scheme).
The dispatcher forwards incoming jobs to the server with the lowest queue estimate. The dispatcher maintains an estimate for every server and increments these estimates for every job that is assigned. Status updates of servers occur at rate per server, and update the estimate that the dispatcher has at its disposal to the actual queue length.
Several aspects of Algorithm 1 are flexible (regarding the implementation and the status updates) and four different schemes that obey the rules of Algorithm 1 will be introduced. There are many more schemes that could be of interest, for example a scheme where the queue estimate is not an upper-bound but mimics the expected value of the queue length. While natural, these schemes are beyond the scope of the current paper.
When the update frequency per server is and denotes the arrival rate per server, the number of messages per job is , which can be easily tuned by varying the value of . Since all queue lengths are updated (on average) once every time units, this gives queue-updates per time unit. Note that this algorithm can be modeled as a strictly push-based scheme (where the dispatcher requests the queue lengths of the servers), as well as a strictly pull-based scheme (where each server sends its queue length to the dispatcher, using an internal clock).
The JSQ() scheme, when implemented in a push-based manner, requires message exchanges per job, which amounts to messages per time unit and is not scalable. However, when servers actively update their queue lengths to the dispatcher in the JSQ scheme or their idleness in the pull-based JIQ scheme, one needs less communication. In this case, any departing job needs to trigger the server to send an update to the dispatcher. This pull-based implementation requires one message of communication per job or per time unit. When queue lengths are large, not even all departing jobs need to trigger the server to send an update for the JIQ policy, which reduces the communication per job slightly. We thus conclude that the tunable communication overhead of per job (doubled when implemented in a push-based manner) of Algorithm 1 is comparable with pull-based JSQ, JIQ and JSQ(). Moreover, Algorithm 1 becomes more scalable for small values of , especially in the -regime. Here an important reference point is that the pull-based JIQ scheme has at most one update per job. By hyper-scalable schemes we mean schemes that can be implemented with , and preferably with .
We introduce four hyper-scalable schemes that obey the rules of Algorithm 1 but differ in when the status-updates are sent. When the updates sent by the servers to the dispatcher are synchronized, we denote the scheme by , Synchronized-Updates Join-the-Shortest-Queue. Similarly we introduce (Asynchronized-Updates Join-the-Shortest-Queue), which is used when the updates are asynchronous. We then add an exp-tag whenever the time between two updates is exponentially distributed (with parameter and mean ) and a det-tag when the time between the updates is constant (). This gives rise to four schemes; , , and .
We show that the four schemes can achieve a vanishing waiting time in the many-server limit with just one message per job, just like JIQ. The proposed schemes are particularly geared however towards the sparse feedback regime with less than message per job, where they outperform corresponding sparsified JIQ versions. With fluid limits we demonstrate that in the ultra-low feedback regime the mean stationary waiting time tends to a constant in the synchronous case, but grows without bound in the asynchronous case. A more detailed overview of our key finding is presented in the next section.
Discussion of additional related work. As mentioned above, MitzenmacherĀ (Mitzenmacher, 2001) and Vvedenskaya et al.Ā (Vvedenskaya etĀ al., 1996) established mean-field limit results for JSQ() strategies. These results indicate that for any subcritical arrival rate , the tail of the queue length distribution at each individual server exhibits super-exponential decay, and thus falls off far more rapidly than the geometric decay in case of purely random assignment. Similar power-of- effects have been demonstrated for heterogeneous servers, non-exponential service requirements and loss systems (Bramson etĀ al., 2010, 2012; Mukhopadhyay etĀ al., 2016, 2015; Mukhopadhyay and Mazumdar, 2014; Xie etĀ al., 2015). For no single value ofĀ , however, a JSQ() strategy can rival the JIQ scheme, which simultaneously achieves low communication overhead and asymptotically optimal performance by leveraging memory at the dispatcherĀ (Gamarnik etĀ al., 2016; Stolyar, 2015). The only exception arises for batches of jobs when the value ofĀ and the batch size grow suitably large, as can be deduced from results inĀ (Ying etĀ al., 2015), but we do not leverage batches in the current paper. As we will show, the hyper-scalable schemes proposed in the present paper are like the JIQ scheme superior to JSQ) strategies, and also beat corresponding sparsified JIQ versions in the regime , which is particularly relevant from a scalability standpoint.
Many popular schemes have also been analyzed in the Halfin-Whitt heavy-traffic regime (Mukherjee etĀ al., 2016) and in the non-degenerate slowdown regime (Gupta and Walton, 2019), in which the JIQ scheme is not necessarily optimal.
The use of memory in load balancing has been studied in (Alon etĀ al., 2010; Mitzenmacher etĀ al., 2002) but mostly in a āballs-and-binsā context as opposed to the queueing scenario that we consider. The work inĀ (Mitzenmacher, 2000) considers a similar setup as ours, and examines how much load balancing degrades when old information is used.
In contrast, our focus is on improving the performance by using non-recent information, similarly to (Anselmi and Dufour, 2018). A general framework for deriving fluid limits in the presence of memory is described inĀ (Luczak and Norris, 2013), but assumes that the length of only one queue is kept in memory, while we allow for all queue lengths to be tracked.
Organization of the paper. In SectionĀ 2 we discuss our key findings and contributions for the hyper-scalable schemes, obtained through fluid-limit analysis and extensive simulations. In SectionĀ 3 we introduce some useful notation and preliminaries, before we turn to a comprehensive analysis of the synchronous and asynchronous cases through the lens of fluid limits in SectionsĀ 4 andĀ 5, respectively. In SectionĀ 6 we conclude with some summarizing remarks and topics for further research.
2. Key findings and contributions
The precise model we consider consists of parallel identical servers and one dispatcher. Jobs arrive at the dispatcher as a Poisson process of rate , where denotes the job arrival rate per server. Every job is dispatched to one of the servers, after which it joins the queue of the server if the server is busy, or will start its service when the server is idle. The job processing requirements are independent and exponentially distributed with unit mean at each of the servers. We consider several load balancing algorithms for the dispatching of jobs to servers, including the hyper-scalable schemes , , and described in the introduction. In the simulation experiments we will also briefly consider , which is similar to , except that only idle servers send notifications. We now present the results from simulation studies and the fluid-limit and fixed-point analysis in SectionsĀ 4 andĀ 5.
2.1. Large-system performance
In order to explore the performance of the hyper-scalable algorithms in the many-server limit , we investigate fluid limits. We analyze their behavior and fixed points, and use these to derive results for the system in stationarity as function of the update frequencyĀ .
Asymptotically optimal feedback regime. Using fluid-limit analysis, we prove that the proposed schemes can achieve a vanishing waiting time in the many-server limit when the update frequency exceeds . In case servers only report zero queue lengths and suppress updates for non-zero queues, the update frequency required for a vanishing waiting time can in fact be lowered to justĀ , matching the one message per job involved in the JIQ scheme.
Sparse feedback regime. FigureĀ 1 displays results from extensive simulations and shows the mean waiting time as function of the number of messages per job. This number is proportional to the update frequencyĀ , and equals for the four hyper-scalable schemes. We also show results for JIQ, a sparsified version of JIQ, where a token is sent to the dispatcher with probabilityĀ whenever a server becomes idle. Random refers to the scheme where every job is assigned to a server selected uniformly at random, and Round-Robin assigns the -th arriving job to server .
For the sparse feedback regime when we see that the schemes and outperform JIQ. Also observe that , the scheme in which only idle servers send reports, achieves a near-zero waiting time with just one message per job, just like the JIQ scheme, and outperforms JIQ() across most of the relevant domain . However, as the waiting time grows without bound, since estimates will grow large due to lack of updates, which causes servers that are reported idle in the latest update to receive many jobs in succession.
Ultra-low feedback regime. We examine the performance in the ultra-low feedback regime where the update frequencyĀ goes to zero, and in particular establish a somewhat counter-intuitive dichotomy. When all status-updates occur synchronously, the behavior of each of the individual queues approaches that of a single-server queue with a near-deterministic arrival process and exponential service times, with the mean stationary waiting time tending to a finite constant. In contrast, for asynchronous updates, the individual queues experience saw-tooth behavior with oscillations and waiting times that grow without bound.
2.2. Synchronize or not?
In case of synchronized updates, the dispatcher will update the queue lengths of the servers simultaneously. Thus, just after an update moment, the dispatcher has a perfect view of the status of all servers and it will dispatch jobs optimally. After a while, the estimates will start to deviate from the actual queue lengths, so that the scheme no longer makes (close to) optimal decisions. With asynchronous updates servers send updates at independent times, which means that some of the estimates may be very accurate, while others may differ significantly from the actual queue lengths.
Round-Robin resemblance. We find that both and resemble Round-Robin as the update frequencyĀ approaches zero, and are the clear winners in the ultra-low feedback regime, which is crucial from a scalability perspective (see FigureĀ 1). To understand the resemblance with Round-Robin, notice that the initial queue lengths after an update will be small compared to the number of arrivals until the next update. Thus soon after the update the dispatcher will essentially start forwarding jobs in a (probabilistic) Round-Robin manner. Specifically, most servers will have equal queue estimates at certain points in time, and they will each receive one job every time units, but in a random order. This pattern repeats itself and resembles Round-Robin, where the difference of received jobs among servers can be at most one.
Dichotomy. In FigureĀ 1 we also see that while outperforms the synchronous variants for large values of the update frequencyĀ , it produces a mean waiting time that grows without bound as approaches zero. The latter issue also occurs for and render the asynchronous versions far inferior in the ultra-low feedback regime compared to both synchronous variants. To understand this remarkable dichotomy, notice that queue estimates must inevitably grow to increasingly large values of the order and significantly diverge from the true queue lengths as the update frequency becomes small, both in the synchronous and asynchronous versions. However, in the synchronous variants the queue estimates will all be lowered and updated to the true queue lengths simultaneously, prompting the dispatcher to evenly distribute incoming jobs over time. In contrast, in the asynchronous versions, a server will be the only one with a low queue estimate right after an update, and almost immediately be assigned a huge pile of jobs to bring its queue estimate at par, resulting in oscillatory effects. This somewhat counter-intuitive dichotomy reveals that the synchronous variants behave benignly in the presence of outdated information, while the asynchronous versions are adversely impacted.
3. Notation and preliminaries
In this section we introduce some useful notation and preliminaries in preparation for the fluid-limit analysis in SectionsĀ 4 andĀ 5. Recall that all the servers are identical and the dispatcher only distinguishes among servers based on their queue estimates and does not take their identities into account when forwarding jobs. Hence we do not need to keep track of the state of each individual server, but only count the number of servers that reside in a particular state. Specifically, we will denote by the number of servers with queue lengthĀ (including a possible job being served) and queue estimate at the dispatcher at timeĀ . Further denote by and the total number of servers with queue lengthĀ and queue estimateĀ , respectively, when the system is in stateĀ .
In order to analyze fluid limits in a regime where the number of serversĀ grows large, we will consider a sequence of systems indexed byĀ , and attach a superscriptĀ to the associated state variables. We specifically introduce the fluid-scaled state variables , representing the fraction of servers in the -th system with true queue lengthĀ and queue estimate at the dispatcher at timeĀ , and assume that the sequence of initial states is such that . Any (weak) limit of the sequence as will be called a fluid limit. Fluid limits do not only yield tractability, but also provide a relevant tool to investigate communication overhead and scalability issues which are inherently tied to scenarios with massive numbers of servers.
Let be the minimum queue estimate among all servers when the system is in stateĀ . When a job arrives and the system is in stateĀ , it is dispatched to one of the servers with queue estimate selected uniformly at random, so it joins a server with queue length with probability . Because of the Poisson arrival process, transitions from a stateĀ to a stateĀ with and thus occur at rate , with , . Due to the unit-exponential processing requirements, transitions from a stateĀ to a stateĀ with and occur at rate , . For notational compactness, we further omit the dependence of for , and and instead write , and as they only depend on though .
In order to specify the transitions due to the updates, we need to distinguish between the synchronous and the asynchronous case.
Synchronous updates
Whenever a synchronous update occurs and the system is in stateĀ , a transition occurs to stateĀ with and for . Note that these transitions only occur in a Markovian fashion when the update intervals are exponentially distributed. When the update intervals are non-exponentially distributed, is not a Markov process, but the evolution between successive updates is still Markovian.
Asynchronous updates
When the system is in stateĀ and a server with queue lengthĀ and queue estimate sends an update to the dispatcher, a transition occurs to a stateĀ with and . Note that these transitions only occur in a Markovian fashion when the update intervals are exponentially distributed. When the update intervals are non-exponentially distributed, is not a Markov process, and in order to obtain a Markovian state description, the state variables would in fact need to be augmented with continuous variables keeping track of the most recent update moments for the various servers.
4. Synchronous updates
In this section we examine the fluid limit for synchronous updates. In SubsectionĀ 4.1 we provide a description of the fluid-limit trajectory, along with an intuitive interpretation, numerical illustration and comparison with simulation. In SubsectionĀ 4.2 the fluid-limit analysis will be used to derive a finite upper bound for the queue length on fluid scale for any given update frequency (PropositionĀ 4.5) and to show that in the long term queueing vanishes on fluid level for a sufficiently high update frequencyĀ (PropositionĀ 4.11).
4.1. Fluid-limit dynamics
The fluid limit (in between successive update moments) satisfies the system of differential equations
[TABLE]
where
[TABLE]
denotes the fraction of jobs assigned to a server with queue lengthĀ and queue estimateĀ in fluid stateĀ at time , with denoting the fraction of servers with queue estimateĀ and the minimum queue estimate in fluid stateĀ at time . At an update momentĀ , the fluid limit shows discontinuous behavior, with and for all , with denoting the fraction of servers with queue lengthĀ in fluid stateĀ at time .
An informal outline of the derivation of the fluid limit as stated inĀ (1) is provided in AppendixĀ A.
4.1.1. Interpretation
The above system of differential equations may be heuristically explained as follows. The first two terms correspond to service completions at servers with and Ā jobs, which result in an increase and decrease in the fraction of servers with Ā jobs, respectively. The third and fourth terms reflect the job assignments to servers with the minimum queue estimate . The third term captures the resulting increase in the fraction of servers with queue estimate , while the fourth term captures the corresponding decrease in the fraction of servers with queue estimate .
Summing the equations (1) over yields
[TABLE]
reflecting that servers with the minimum queue estimate are assigned jobs, and flipped into servers with queue estimate , at rateĀ , and that can only increase between successive update moments. Also note that the derivative of is continuous inĀ , except at those timesĀ where increases, and that is continuous in between updates since is bounded.
4.1.2. Numerical illustration and comparison with simulation
FiguresĀ 3ā5 show the fluid-limit trajectories as governed by the differential equations inĀ (1) for , along with (fluid-scaled) variables obtained through stochastic simulation for a system with servers and averaged over 10Ā runs. Observe that the simulation results are nearly indistinguishable from the fluid-limit trajectories, which is in line with broader findings concerning the accuracy of fluid and mean-field limitsĀ (Gast, 2017; Ying, 2016).
Update moments at times are marked by vertical dotted lines. These time points can also be easily recognized by the jumps in the fraction of servers that have queue estimateĀ . Moreover, the fraction of servers with queue lengthĀ is not differentiable in these points as well as other moments when the minimum queue estimate changes.
Qualitatively similar results are observed for , where the updates occur at irregular moments. The paths still follow similar saw-tooth patterns, and the dynamics between updates are identical, as reflected in the differential equations inĀ (1). In particular, the fraction of servers with minimum queue estimate decreases linearly between updates, and the estimate drastically changes at update moments. The results are displayed in FiguresĀ 12ā14, which are deferred to AppendixĀ E because of space constraints.
In FiguresĀ 3 andĀ 3, , while in FiguresĀ 5 andĀ 5. Since in the first scenario there are moments at which is zero, some jobs are sent to servers with one job, so that servers sometimes have two jobs, which means that queueing does not vanish as . In contrast, in the second scenario, is strictly positive at all times and no servers appear with two or more jobs, which implies that no queueing occurs at fluid level as . We will return to this dichotomy in PropositionĀ 4.11.
4.2. Performance in the fluid limit
We will now use the fluid limitĀ (1) to gain some insight in the performance of the scheme. (2) shows that no fixed point can exist as has a non-zero constant derivative. We will establish however in PropositionĀ 4.5 that for any positive update frequency the queue lengths on fluid scale are essentially bounded by a finite constant. Furthermore, in PropositionĀ 4.11 we demonstrate that when the update frequency is above a specific threshold, queueing basically vanishes on fluid level in the long term.
Consider the average queue length on fluid scale, denoted and defined by
[TABLE]
Further define as the fraction of servers with queue lengthĀ or larger at timeĀ on fluid scale, and note that . We will also refer to as the total queue āmassā on fluid scale at timeĀ , and introduce
[TABLE]
and
[TABLE]
as the queue mass (weakly) below and (strictly) above levelĀ , respectively.
The fluid-limit equationĀ (1) yields the following expression for the derivative of (in between updates),
[TABLE]
and
[TABLE]
This may be interpreted by noting that the queue mass above levelĀ increases due to arriving jobs being assigned to servers with queue lengthĀ or larger and decreases due to jobs being completed by servers with queue length or larger. In particular, we find that
[TABLE]
for all .
Taking and noting that , we obtain
[TABLE]
and thus
[TABLE]
This reflects that the average queue length on fluid scale at timeĀ is obtained by adding the number of arrivals during and subtracting the number of service completions, which corresponds to the cumulative fraction of busy servers .
The next lemma follows directly fromĀ (4), and shows that the queue mass above levelĀ decreases when the minimum queue estimate on fluid scale is strictly belowĀ , so that there are no arrivals to servers with queue lengthĀ or larger.
Lemma 4.1.
If for all , then .
We now proceed to derive a specific characterization of the decline in the queue mass above levelĀ when the minimum queue estimate on fluid scale is strictly belowĀ .
For conciseness, denote
[TABLE]
and
[TABLE]
with , and let be a Poisson random variable with parameterĀ . Note that and so that , and in particular . Observe that may be interpreted as the expected queue length after a time interval of lengthĀ at a single server with initial queue lengthĀ , unit-exponential service times and no arrivals, while may be interpreted as the expected number of service completions during that time period.
Lemma 4.2.
For any , , ,
[TABLE]
The proof of LemmaĀ 4.2 involves a detailed analysis of . In the lemma special attention goes to the mass of jobs that are queued in positions up to , represented by . The decline in this mass is no less than the decline in a situation where the same total number of jobs reside with servers have eitherĀ [math] jobs or exactly Ā jobs (so a fraction of the servers will have Ā jobs). Finally, represents the expected number of jobs that remain at timeĀ at each of the servers with Ā jobs, while represents the expected number of jobs that have been completed at timeĀ by each of these servers. These observations will be made rigorous in the proof in AppendixĀ C.1.
The next lemma follows directly fromĀ (3) andĀ (4) in conjunction with LemmaĀ 4.2.
Lemma 4.3.
For any , , ,
[TABLE]
or equivalently,
[TABLE]
If for all , then for any
[TABLE]
or equivalently,
[TABLE]
In particular, taking ,
[TABLE]
yielding
[TABLE]
The next lemma provides a simple condition for the minimum queue estimate on fluid scale to remain strictly belowĀ throughout the interval in terms of and the proof is provided in Appendix C.2
Lemma 4.4.
If , then for all .
We will henceforth say that is large enough if
[TABLE]
and define , which may be loosely thought of as the maximum queue length on fluid scale in the sense of the next proposition.
Note that as , ensuring that is finite for any .
Proposition 4.5 (Bounded queue length for ).
For any initial state with finite queue mass , the fraction of servers on fluid scale with a queue length larger than vanishes over time. Additionally, if the initial queue mass is sufficiently small and the initial fraction of servers with a queue length larger than is zero, then that fraction will remain zero forever.
The proof of PropositionĀ 4.5 leverages LemmaĀ 4.2 and is organized as follows. For any initial state, we can show that either the mass in the tail, or the total mass is decreasing. Once one of the two is below a certain level, we show that the other decreases as well. We show this in two lemmas. From that point on, it is a back and forth between decreasing mass in the tail and decreasing total mass, which is described in the final lemma. The mass in the tail will decrease, such that the mass strictly above level will vanish.
Define
[TABLE]
We now state two corollaries, which provide upper bounds for the total queue mass at timeĀ . The proofs are based on LemmaĀ 4.2 and provided in AppendicesĀ C.3 andĀ C.4.
Corollary 4.6.
If is large enough and , then
[TABLE]
Corollary 4.7.
If is large enough and , then
[TABLE]
We now present LemmasĀ 4.8 andĀ 4.9, which use CorollariesĀ 4.6 andĀ 4.7 and LemmasĀ 4.1, 4.3 andĀ 4.4 to show that under certain conditions, the total queue mass as well as the mass above levelĀ strictly decrease.
Lemma 4.8.
If is large enough, and , then
- ā¢
, where , so is strictly smaller at the next update,
- ā¢
, so remains strictly smaller than .
Proof.
The first statement follows from CorollaryĀ 4.6, with . The second assertion follows from LemmasĀ 4.1 andĀ 4.4. ā
Lemma 4.9.
If is large enough, and then
- ā¢
, so remains strictly smaller than at the next update,
- ā¢
, so remains strictly smaller than ,
- ā¢
* if , where , so is strictly decreasing by a constant factor over each update interval.*
Proof.
The first statement follows from CorollaryĀ 4.7. The second assertion follows from LemmasĀ 4.1 andĀ 4.4. The third statement holds for
[TABLE]
since the final portion of Lemma 4.3 in conjunction with Lemma 4.4 gives
[TABLE]
ā
Lemma 4.10.
If is large enough, and , then there exists a finite time such that and (as defined in (8)).
Proof.
The proof is constructed by applying LemmasĀ 4.8 andĀ 4.9 in succession. Since and , LemmaĀ 4.8 can be applied, so that is strictly decreasing and eventually becomes smaller than while remains smaller than . Note that does not decrease after any iteration since can only decrease. At that moment, LemmaĀ 4.9 can be applied which shows that decreases by a constant factor over each update interval as long as is larger than . The constant factor does not increase after any iteration, and in fact only becomes smaller as decreases. In other words, becomes smaller than after finitely many updates, while remains smaller than and smaller than . ā
Proof of PropositionĀ 4.5. Observe that the procedure in the proof of LemmaĀ 4.10 can be performed as long as is large enough. The left-hand side ofĀ (7) is increasing inĀ (as both factors are increasing inĀ ), which shows that the conditionĀ (7) becomes tighter for smaller values ofĀ . In fact, the mass above levelĀ will vanish, where is the lowest value ofĀ which is sufficiently large for (7) to hold, yielding the first statement of PropositionĀ 4.5. The latter part follows directly from Lemma 4.8.
Finally, we stress that the lemma can also be applied when the maximum initial queue length is infinite, but the mass is finite. In that case, one can find a value for such that (as the tail of a convergent series tends to zero) and . In that case, the lemma can be successively applied, starting from , which proves PropositionĀ 4.5.
We now proceed to state the second main result in this section. Denote by the fluid state with , , , and for all .
Proposition 4.11 (No-queueing threshold for in ).
Suppose , or equivalently, . Then is a fixed point of the fluid-limit process at update moments in the following sense:
- (a)
If , then for all ; 2. (b)
For any initial state with , as .
*Moreover, in caseĀ (a), , , for all , , and for all , .
In caseĀ (b), , , for all , , and for all , .*
Loosely speaking, PropositionĀ 4.11 implies that for , in the long term the fraction of jobs that incur a non-zero waiting time vanishes. We note that in this regime, jobs are only sent to idle servers, which means that servers only need to send feedback whenever they are idle at an update moment. Since the fraction of idle servers in the fixed point is , a sparsified version of will have a communication overhead of per time unit or 1 message per job.
The next lemma, whose proof is presented in AppendixĀ C.5, provides lower and upper bounds for the number of service completions on fluid level and the total queue mass, which play an instrumental role in the proof of PropositionĀ 4.11. The bounds and proof arguments are similar in spirit to those of LemmaĀ 4.2 with , but involve crucial refinements by additionally accounting for service completions of arriving jobs.
Lemma 4.12.
If , then (i) .
If , then is bounded from below by
[TABLE]
so that in view ofĀ (5) (ii)
[TABLE]
and
[TABLE]
so that in view ofĀ (5) (iii),
[TABLE]
If , then is bounded from below by
[TABLE]
so that in view ofĀ (5) (iv)
[TABLE]
with , .
We deduce that (v)
[TABLE]
with .
Proof of PropositionĀ 4.11.
We first consider caseĀ (a) with . The fluid-limit equations can then be explicitly solved to obtain , , , and for all for .
Since at an update moment , , and for all , we obtain a strictly cyclic evolution pattern with for all , as well as , , for all , and for all , .
We now turn to caseĀ (b). First suppose that there exists a such that . It then follows from statementĀ (i) in LemmaĀ 4.12 that for all . Moreover, in view of statementĀ (v) in LemmaĀ 4.12 we have , with and when for any . Thus, for any it can only occur finitely many times that , and additionally for any , it can only occur finitely many times that , because otherwise would eventually fall to zero, which would contradict . Thus we conclude that and , which implies that as as stated.
Now suppose that there exists no such that , i.e, for all . Solving (1) then gives and for , where . LemmaĀ 4.3 with , yields
[TABLE]
Thus we must have as , because otherwise would eventually drop below zero, which would contradict the fact that it must always be positive. Hence, for any , there exists such that for all ,. StatementĀ (iii) in LemmaĀ 4.12 may then be invoked to obtain that for any and if , then
[TABLE]
It follows that for any , it can only occur finitely many times that , as is bounded since is decreasing (Lemma 4.1). Thus we conclude and as , implying that as as stated.
ā
5. Asynchronous updates
In this section we turn to the fluid limit for asynchronous updates. As mentioned earlier, for non-exponential update intervals, the state variables would need to be augmented with continuous variables in order to obtain a Markovian state description. This would give rise to a measure-valued fluid-limit description, and involve heavy technical machinery, but provide little insight, and hence we will focus on the fluid limit for exponential update intervals. In SubsectionĀ 5.1 we provide a characterization of the fluid-limit trajectory, along with a heuristic explanation, numerical illustration and comparison with simulation. In SubsectionĀ 5.2 the fixed point of the fluid limit is determined (PropositionĀ 5.1), which immediately shows that in stationarity queueing vanishes at fluid level for sufficiently highĀ (CorollaryĀ 5.2) and also provides an upper bound for the queue length at fluid level for any given (CorollaryĀ 5.3).
5.1. Fluid-limit dynamics
In the case of synchronous updates, the minimum queue estimate could never decrease between successive update moments. As a result, the amount of time that the minimum queue length equalsĀ in between successive update moments converges to , as , with and can be directly expressed in terms of the minimum queue estimate on fluid scale.
In contrast, with asynchronous updates, the minimum queue estimate may drop at any time when an individual server with a queue length sends an update at timeĀ , and becomes the only server with a queue estimate below . Consequently, the amount of time that the minimum queue length equalsĀ no longer tends to as , and may even have a positive derivative for , i.e., for queue values strictly smaller than the minimum queue estimate on fluid scale.
The fact that even in the limit the system may spend a non-negligible amount of time in states that are not directly visible on fluid scale severely complicates the characterization of the fluid limit. In order to handle this complication and describe the evolution of the fluid limit, it is convenient to define as the fluid-scaled rate at which the dispatcher can assign jobs to servers with queue estimates belowĀ as a result of updates, with representing the fraction of servers with queue lengthĀ in fluid stateĀ at time as before.
We distinguish two cases, depending on whether or not, and additionally introduce , defined as in case , or otherwise. Then servers with a true queue length will be assigned jobs almost immediately after an update at timeĀ , and then have both queue length and queue estimate . Incoming jobs will be assigned to servers with a queue estimate at most at rate and to servers with queue estimate exactly equal to at rate .
Then the fluid limit satisfies the system of differential equations
[TABLE]
for all , where
[TABLE]
denotes the fraction of jobs assigned to a server with queue lengthĀ and queue estimateĀ in fluid stateĀ among the ones that are assigned to a server with queue estimate at least , defined as function of the fluid stateĀ at time as above.
It can be checked that when , the derivative of is strictly positive, i.e., the fraction of servers with a queue estimate below becomes positive, and the value of instantly becomes equal to .
An informal outline of the derivation of the fluid limit as stated inĀ (9) is provided in AppendixĀ B.
5.1.1. Interpretation
The above system of differential equations may be intuitively interpreted as follows. The first two terms correspond to service completions at servers with and Ā jobs, just like inĀ (1). The third and fourth terms account for job assignments to servers with a queue estimate . The third term captures the resulting increase in the fraction of servers with queue estimate , while the fourth term captures the corresponding decrease in the fraction of servers with queue estimate .
The final three terms inĀ (9) correspond to the updates from servers received at a rateĀ . The fifth term represents the increase in the fraction of servers with queue estimate due to updates from servers with a queue length which are almost immediately being assigned jobs and then have both queue length and queue estimate . The sixth term represents the increase in the fraction of servers with a queue estimate or larger due to updates from servers with a queue length . The final term represents the decrease in the number of servers with queue lengthĀ and queue estimateĀ due to updates.
Even though a non-zero fraction of the jobs are assigned to servers with a queue estimate below , these events are not directly visible at fluid level, and only implicitly enter the fluid limit through the thinned arrival rate .
Summing the equationsĀ (9) over yields
[TABLE]
reflecting that servers with queue estimate are assigned jobs, and thus flipped into servers with queue estimate , at rateĀ , and that servers with a queue estimate are created at an effective rate as a result of updates.
5.1.2. Numerical illustration and comparison with simulation
FiguresĀ 7-9 show the fluid-limit trajectoriesĀ as governed by the differential equations inĀ (9) for , through stochastic simulation for a system with servers and averaged over 10Ā runs. Once again, the simulation results are nearly indistinguishable from the fluid-limit trajectories.
In contrast to the synchronous variants in FiguresĀ 3ā5, the trajectories do not oscillate, but approach stable values, corresponding to the fixed point of the fluid-limit equationsĀ (9) which we will analytically determine in PropositionĀ 5.1. In FiguresĀ 3 andĀ 3 where is relatively low, we observe once again that and become strictly positive. In FiguresĀ 5 andĀ 5 where is sufficiently large, all servers have either zero or one jobs in the limit, indicating that no queueing occurs.
Qualitatively similar results are observed for , where the updates occur at strictly regular moments. The results are displayed in FiguresĀ 16ā18, which are again relegated to AppendixĀ F because of space limitations.
5.2. Fixed-point analysis
The next proposition identifies the fixed point of the fluid-limit equationsĀ (9) in terms ofĀ , defined as
[TABLE]
which may be interpreted as the minimum queue estimate at fluid level in stationarity. For compactness, define and .
Proposition 5.1 (Fixed point for ).
The fixed point of the fluid limit (9) is given by
[TABLE]
and when , where is the unique solution of the equation , i.e.,
[TABLE]
In particular, if , i.e.,
[TABLE]
then , so
[TABLE]
and for all .
Note that is strictly decreasing inĀ , with , and , ensuring that exists and is unique.
The result of PropositionĀ 5.1 is obtained by setting the derivatives inĀ (9) equal to zero, observing that for all , and then solving the resulting equations. The detailed proof arguments are presented in AppendixĀ D.
Corollary 5.2 (No-queueing threshold for in ).
If the update frequency , then for all , implying that queueing vanishes at fluid level in stationarity.
CorollaryĀ 5.2 immediately follows fromĀ (10) and PropositionĀ 5.1, in which in case so that only , and are strictly positive. In case of equality we have the scenario described in the last part of PropositionĀ 5.1 where for allĀ . In case the many-server () and stationary () limits can be interchanged (a rigorous proof of that would involve establishing global asymptotic stability of the fluid limit, which is beyond the scope of the present paper), CorollaryĀ 5.2 implies that for , the mean stationary waiting time under vanishes as .
PropositionĀ 5.1 also yields an upper bound for the queue length at fluid level as stated in the next corollary.
Corollary 5.3 (Bounded queue length for ).
The queue length at fluid level in stationarity has bounded support on for any and .
First of all, note that in order for queueing to vanish, it is required that or and , i.e., , which coincides with the threshold for in as identified in PropositionĀ 4.11. Also, the upper bound tends to infinity as approaches zero, reflecting that for any fixed arrival rateĀ , even arbitrarily low, the maximum queue length grows without bound as the update frequency vanishes.
At the same time, for any positive , is finite for any fixed , and only grows as as rather than as in the absence of any queue feedback. Thus, even an arbitrarily low update frequency ensures that the queue length has bounded support and behaves far more benignly in a high-load regime at fluid level. This powerful property resembles an observation in work of Tsitsiklis & Xu (Tsitsiklis and Xu, 2012, 2013) in the context of a dynamic scheduling problem where even a minuscule degree of resource pooling yields a fundamentally different behavior on fluid scale.
5.2.1. Number of jobs in the system
The average queue length in the fixed point is
[TABLE]
In case ofĀ (12), the average queue length is simply , reflecting that the average number of job arrivals equals the average number of job completions over the course of an update interval, starting with jobs.
Figure 10 plots the average queue length in the fixed point given by (13) as function of the update frequencyĀ for . We observe that the average queue length monotonically decreases with the update frequency, as expected, and is indeed contained between and .
It then follows from the definition of that as for any , which indicates that may perform arbitrarily badly in the ultra-low feedback regime, confirming the observations in SubsectionĀ 1.
5.2.2. Bound on the queue length for
As noted earlier, the fluid-limit trajectory for involves a measure-valued process and is difficult to describe. However, in a similar spirit as for , the value ofĀ can be characterized as the largest integer for which
[TABLE]
expressing that the average number of job arrivals should be larger than or equal to the average number of job completions over the course of an update interval, starting with Ā jobs. While the above equation cannot easily be solved in closed form, it is not difficult to show that the inequality is weaker thanĀ (10), i.e., the value ofĀ is lower than for , confirming the superiority of observed in the simulation results in SubsectionĀ 1. It is further worth observing the strong similarity of the above inequality with Proposition 4.5 governing the queue length upper bound for .
6. Conclusions
We have introduced and analyzed a novel class of hyper-scalable load balancing algorithms that only involve minimal communication overhead and yet deliver excellent performance. In the proposed schemes, the various servers provide occasional queue status notifications so as to guide the dispatcher in directing incoming jobs to relatively short queues.
We have demonstrated that the schemes markedly outperform JSQ() policies with a comparable overhead, and can drive the waiting time to zero in the many-server limit with just one message per job. The proposed schemes show their core strength and outperform sparsified JIQ versions in the sparse feedback regime with less than one message per job, which is particularly pertinent from a scalability viewpoint.
In order to further explore the performance in the many-server limit, we investigated fluid limits for synchronous as well as asynchronous exponential update intervals. We used the fluid limits to obtain upper bounds for the stationary queue length as function of the load and update frequency. We also revealed a striking dichotomy in the ultra-low feedback regime where the mean waiting time tends to a constant in the synchronous case, but grows without bound in the asynchronous case. Extensive simulation experiments are conducted to support the analytical results, and indicate that the fluid-limit asymptotics are remarkable accurate.
In the present paper we have adopted common Markovian assumptions, and in future work we aim to extend the results to non-exponential and possibly heavy-tailed distributions. We also intend to pursue schemes that may dynamically suppress updates or selectively refrain from updates at pre-scheduled epochs to convey implicit information, and reduce the communication overhead yet further.
Acknowledgments
This work is supported by the NWO Gravitation Networks grant 024.002.003, an NWO TOP-GO grant and an ERC Starting Grant.
Appendix A Derivation sketch of fluid limitĀ (1)
for synchronous updates.
We now provide an informal outline of the derivation of the fluid limit for synchronous updates as stated inĀ (1). Let and denote unit-rate Poisson processes, , all independent.
The system dynamics (in between successive update moments) may then be represented as (see for instanceĀ (Pang etĀ al., 2007))
[TABLE]
with .
Dividing byĀ and rewriting in terms of the fluid-scaled variables , we obtain
[TABLE]
Now introduce
[TABLE]
and observe that and are martingales. By standard arguments it can be shown that both and converge to zero as .
Exploiting the fact that the minimum queue estimate cannot decrease in between successive update moments, it can also be established that
[TABLE]
as , with as defined earlier.
Taking the limit for inĀ (14), we conclude that any (weak) limit of the sequence in between successive update moments must satisfy
[TABLE]
with .
Rewriting the latter integral equation in differential form yieldsĀ (1).
Appendix B Derivation sketch of fluid limitĀ (1)
for asynchronous updates.
We now provide an informal outline of the derivation of the fluid limit as stated inĀ (9). Let , and denote unit-rate Poisson processes, , all independent. The system dynamics may then be represented asĀ (Pang etĀ al., 2007)
[TABLE]
with as before. Dividing byĀ and rewriting in terms of the fluid-scaled variables , we obtain
[TABLE]
Now introduce
[TABLE]
and observe that , and are martingales. By standard arguments it can be shown that
[TABLE]
each converge to zero as .
Adopting time-scale separation arguments as developed by Hunt & KurtzĀ (Hunt and Kurtz, 1994), it can be established that
[TABLE]
as , where the coefficients satisfy
[TABLE]
The coefficients may be interpreted as the fraction of time that the pre-limit minimum queue estimate equalsĀ when the minimum queue estimate at fluid level isĀ , and satisfy the relationship
[TABLE]
for all , along with the normalization condition .
Thus, we obtain
[TABLE]
for all , and
[TABLE]
We deduce that
[TABLE]
with as before, yielding
[TABLE]
Thus we obtain
[TABLE]
as , with and as defined earlier.
Taking the limit for inĀ (15), and noting that
[TABLE]
we conclude that any (weak) limit of the sequence must satisfy
[TABLE]
with . Rewriting the latter integral equation in differential form yieldsĀ (9).
Appendix C Proofs of SectionĀ 4.2
C.1. Proof of Lemma 4.2
Let , , , be the solution to the fluid-limit equationĀ (1) with , i.e.,
[TABLE]
with initial conditions and for all , . The solution may be interpreted as the fluid limit in the absence of any arrivals, and it is easily verified that
[TABLE]
for all , and
[TABLE]
. Further introduce , , , and note fromĀ (3) that
[TABLE]
We will first establish that for all , , reflecting that the fraction of servers with queue lengthĀ or larger on fluid scale is no less than what it would be in the absence of any arrivals. Suppose that were not the case, and let be the first time when that inequality is about to be violated for some . Then we must have , implying , since . Now observe that
[TABLE]
while
[TABLE]
Hence cannot fall below at (just after) , contradicting the initial supposition in which would be the first time that the inequality is about to be violated.
InvokingĀ (17), we obtain
[TABLE]
Now observe that
[TABLE]
or equivalently,
[TABLE]
which may be interpreted from the fact that the expected fraction of jobs that remain after a period of lengthĀ is smaller with an initial queue of sizeĀ than . Thus, for all . Also,
[TABLE]
so that for all . We obtain
[TABLE]
yielding
[TABLE]
C.2. Proof of Lemma 4.4
We have , for , , for , , for , , , for t\in\frac{1}{\lambda}\big{[}\sum_{i=0}^{j-1}(j-i)v_{i}(0), \sum_{i=0}^{j}(j+1-i)v_{i}(0)\big{]}, assuming . In particular for all if . The latter inequality holds since
[TABLE]
C.3. Proof of CorollaryĀ 4.6
Taking in LemmaĀ 4.3, we obtain
[TABLE]
with
[TABLE]
increasing in , and the smallest value that is large enough, note that because of (7).
C.4. Proof of CorollaryĀ 4.7
Taking in LemmaĀ 4.3 and noting that since , we obtain
[TABLE]
with and as defined in Appendix C.3 and because of (7).
C.5. Proof of LemmaĀ 4.12
Just like in the proof of LemmaĀ 4.2, let , , , be the solution to the fluid-limit equationĀ (1) with , i.e.,
[TABLE]
but now with initial conditions such that for all , with and , and for all , . As before, may be interpreted as the fluid limit in the absence of any arrivals, and it is easily verified that
[TABLE]
for all , and
[TABLE]
. Further let , be solutions to the system of differential equations
[TABLE]
with and initial conditions .
It is easily verified that
[TABLE]
The variable may be interpreted as the fraction of servers with queue lengthĀ [math] at timeĀ [math], queue lengthĀ [math] at timeĀ and queue estimateĀ at timeĀ , i.e., which have been assigned an arriving job and completed that job by timeĀ . Likewise, may be interpreted as the fraction of servers with queue lengthĀ [math] at timeĀ [math], queue lengthĀ at timeĀ and queue estimateĀ at timeĀ , i.e., which have been assigned an arriving job that remains to completed by timeĀ .
In a similar fashion as in the proof of LemmaĀ 4.2, it can be established that and for all and .
To prove statementĀ (i), consider , and for all . Noting that yields , and thus . Also, . We obtain that
[TABLE]
yielding .
To establish assertionĀ (ii), consider for all . Then just like in the proof of LemmaĀ 4.3, noting that for all ,
[TABLE]
Also, , and thus
[TABLE]
We obtain that
[TABLE]
To prove statement (iii), consider as before for all . Further observe that
[TABLE]
and
[TABLE]
Then just like in the proof of LemmaĀ 4.3, noting that for all ,
[TABLE]
Also, , and thus
[TABLE]
We obtain
[TABLE]
To establish assertionĀ (iv), consider for all .
Then, just like in the proof of statementĀ (ii),
[TABLE]
Also, noting that yields , and thus .
We obtain that
[TABLE]
StatementĀ (v) follows from statementsĀ (ii) andĀ (iv).
Appendix D Derivation of fixed point
For convenience, denote by the minimum queue estimate associated with the fixed point. Further denote if , or otherwise.
Setting the derivatives in (9) equal to zero and denoting , we deduce
[TABLE]
for all . Similarly, we have for ,
[TABLE]
which yields for all and . Additionally, applyingĀ (19) with and , gives
[TABLE]
for all . In conclusion, it is readily seen that for all . This implies , and yields
[TABLE]
for all , and
[TABLE]
for all .
We obtain (with )
[TABLE]
or equivalently,
[TABLE]
and
[TABLE]
or equivalently,
[TABLE]
along with
[TABLE]
Note that Equations (21) and (22) determine and :
[TABLE]
It follows from the above equations (flux up equals flux down) that
[TABLE]
which implies that
[TABLE]
with
[TABLE]
is equivalent with
[TABLE]
reflecting that each server is idle a fraction of the time .
Recall that and . We can use the above equations to express in for all , and recursively obtain
[TABLE]
Subsequently, we express in terms of and , and recursively derive
[TABLE]
for ,
[TABLE]
for ,
[TABLE]
for ,
[TABLE]
and .
It only remains to be shown that the equation (11) has a unique solution , which then further implies that
[TABLE]
as noted earlier.
In order to establish that a solution exists, note that , and
[TABLE]
as , while and
[TABLE]
as .
It may further be shown that is in fact (strictly) decreasing inĀ , ensuring that the value ofĀ is also unique.
Appendix E Simulation results for
The next four figures provide the fluid-limit trajectories and associated simulation paths for a system with servers and for as referred to in SectionĀ 4.1.2.
Appendix F Simulation results for
The next four figures provide the simulation plots for a system with servers and , averaged over 100 runs for as referred to in SectionĀ 5.1.2.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Alon et al . (2010) N Alon, O Gurel-Gurevich, and E Lubetzky. 2010. Choice-memory tradeoff in allocations. Ann. Appl. Probab. 20, 4 (08 2010), 1470ā1511. https://doi.org/10.1214/09-AAP 656 Ā· doiĀ ā
- 3Anselmi and Dufour (2018) J Anselmi and F Dufour. 2018. Power-of- d š d -Choices with Memory: Fluid Limit and Optimality. ar Xiv preprint ar Xiv:1802.06566 (2018).
- 4Badonnel and Burgess (2008) R Badonnel and M Burgess. 2008. Dynamic pull-based load balancing for autonomic servers. In Network Operations and Management Symposium, 2008. NOMS 2008. IEEE . IEEE, 751ā754.
- 5Bramson et al . (2010) M Bramson, Y Lu, and B Prabhakar. 2010. Randomized load balancing with general service time distributions. In ACM SIGMETRICS Performance Evaluation Review , Vol. 38(1). ACM, 275ā286.
- 6Bramson et al . (2012) M Bramson, Y Lu, and B Prabhakar. 2012. Asymptotic independence of queues under randomized load balancing. Queueing Systems 71, 3 (2012), 247ā292.
- 7Ephremides et al . (1980) A Ephremides, P Varaiya, and J Walrand. 1980. A simple dynamic routing problem. IEEE transactions on Automatic Control 25, 4 (1980), 690ā693.
- 8Gamarnik et al . (2016) D Gamarnik, J N Tsitsiklis, and M Zubeldia. 2016. Delay, memory, and messaging tradeoffs in distributed service systems. ACM SIGMETRICS Performance Evaluation Review 44, 1 (2016), 1ā12.
