An Overview for Markov Decision Processes in Queues and Networks

Quan-Lin Li; Jing-Yu Ma; Rui-Na Fan; Li Xia

arXiv:1907.10243·math.OC·August 26, 2019

An Overview for Markov Decision Processes in Queues and Networks

Quan-Lin Li, Jing-Yu Ma, Rui-Na Fan, Li Xia

PDF

Open Access

TL;DR

This paper provides a comprehensive overview of Markov decision processes in queues and networks, highlighting historical evolution, key results, and future research directions to aid understanding and application in practical areas.

Contribution

It offers a detailed synthesis of the development and current state of MDPs in queues and networks, including future research directions.

Findings

01

Historical evolution of MDPs in queues and networks

02

Summary of key theoretical results

03

Identification of promising future research areas

Abstract

Markov decision processes (MDPs) in queues and networks have been an interesting topic in many practical areas since the 1960s. This paper provides a detailed overview on this topic and tracks the evolution of many basic results. Also, this paper summarizes several interesting directions in the future research. We hope that this overview can shed light to MDPs in queues and networks, and also to their extensive applications in various practical areas.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Queuing Theory Analysis · Distributed systems and fault tolerance · Simulation Techniques and Applications

Full text

An Overview for Markov Decision Processes in Queues and Networks

Quan-Lin Lia

Jing-Yu Mab

Rui-Na Fanb

Li Xiac

aSchool of Economics and Management

Beijing University of Technology

Beijing 100124

China

bSchool of Economics and Management

Yanshan University

Qinhuangdao 066004

China

cBussiness School

Sun Yat-sen University

Guangzhou 510275

China

Abstract

Markov decision processes (MDPs) in queues and networks have been an interesting topic in many practical areas since the 1960s. This paper Provides a detailed overview on this topic and tracks the evolution of many basic results. Also, this paper summarizes several interesting directions in the future research. We hope that this overview can shed light to MDPs in queues and networks, and also to their extensive applications in various practical areas.

Keywords: Queueing systems; Queueing networks; Markov Decision processes; Sensitivity-based optimization; Event-based optimization.

1 Introduction

One main purpose of this paper is to provide an overview for research on MDPs in queues and networks in the last six decades. Also, such a survey is first related to several other basic studies, such as, Markov processes, queueing systems, queueing networks, Markov decision processes, sensitivity-based optimization, stochastic optimization, fluid and diffusion control. Therefore, our analysis begins from three simple introductions: Markov processes and Markov decision processes, queues and queueing networks, and queueing dynamic control.

(a) Markov processes and Markov decision processes

The Markov processes, together with the Markov property, were first introduced by a Russian mathematician: Andrei Andreevich Markov (1856-1922) in 1906. See Markov [238] for more details. From then on, as a basically mathematical tool, the Markov processes have extensively been discussed by many authors, e.g., see some excellent books by Doob [99], Karlin [175], Karlin and Taylor [176], Chung [80], Anderson [21], Kemeny et al. [181], Meyn and Tweedie [241], Chen [77], Ethier and Kurtz [110] and so on.

In 1960, Howard [165] is the first to propose and discuss the MDP (or stochastic dynamic programming) in terms of his Ph.D thesis, which opened up a new and important field through an interesting intersection between Markov processes and dynamic programming (e.g., see Bellman and Kalaba [32]). From then on, not only are the MDPs an important branch in the area of Markov processes, but also it is a basic method in modern dynamic control theory. Crucially, the MDPs have been greatly motivated and widely applied in many practical areas in the past 60 years. Readers may refer to some excellent books, for example, the discrete-time MDPs by Puterman [261], Glasserman and Yao [143], Bertsekas [33], Bertsekas and Tsitsiklis [34], Hernádez-Lerma and Lasserre [155, 156], Altman [9], Koole [193] and Hu and Yue [166]; the continuous-time MDPs by Guo and Hernández-Lerma [145]; the partially observable MDPs by Cassandra [67] and Krishnamurthy [196]; the competitive MDPs (i.e., stochastic game) by [127]; the sensitivity-based optimization by Cao [58]; some applications of MDPs by Feinberg and Shwartz (Eds.) [122] and Boucherie and Van Dijk (Eds.) [44]; and so on.

(b) Queues and queueing networks

In the early 20th century, a Danmark mathematician: Agner Krarup Erlang, published a pioneering work [109] of queueing theory in 1909, which started the study of queueing theory and traffic engineering. Over the past 100 years, queueing theory has been regarded as a key mathematical tool not only for analyzing practical stochastic systems but also for promoting theory of stochastic processes (such as Markov processes, semi-Markov processes, Markov renew processes, random walks, martingale theory, fluid and diffusion approximation, and stochastic differential equations). On the other hand, the theory of stochastic processes can support and carry forward advances in queueing theory and applications (for example single-server queues, multi-server queues, tandem queues, parallel queues, fork-join queues, and queueing networks). It is worthwhile to note that so far queueing theory has been widely applied in many practical areas, such as manufacturing systems, computer and communication networks, transportation networks, service management, supply chain management, sharing economics, healthcare and so forth.

The single-server queues and the multi-server queues: In the early development of queueing theory (1910s to 1970s), the single-server queues were a main topic with key results including Khintchine formula, Little’s law, birth-death processes of Markovian queues, the embedded Markov chain, the supplementary variable method, the complex function method and so on. In 1969, Professor J.W. Cohen published a wonderful summative book [81] with respect to theoretical progress of single-server queues.

It is a key advance that Professor M.F. Neuts proposed and developed the phase-type (PH) distributions, Markovian arrival processes (MAPs), and the matrix-geometric solution, which were developed as the matrix-analytic method in the later study, e.g., see Neuts [245, 246] and and Latouche and Ramaswami [210] for more details. Further, Li [218] proposed and developed the RG-factorizations for any generally irreducible block-structured Markov processes. Crucially, the RG-factorizations promote the matrix-analytic method to a unified matrix framework both for the steady-state solution and for the transient solution (for instance the first passage time and the sojourn time). In addition, the matrix-analytic method and the RG-factorizations can effectively deal with small-scale stochastic models with several nodes.

In the study of queueing systems, some excellent books include Kleinrock [184, 185], Tijms [304] and Asmussen [23]. Also, an excellent survey on key queueing advances was given in Syski [302]; and some overview papers on different research directions were reported by top queueing experts in two interesting books by Dshalalow [104, 105].

*The queueing networks: *In 1957, J.R. Jackson published a seminal paper [168] which started research on queueing networks. Subsequent interesting results include Jackson [169], Baskett et al. [29], Kelly [178, 180], Disney and König [97], Dobrushin et al. [98], Harrison [152], Dai [86] and so on. For the queueing networks, the well-known examples contain Jackson networks, BCMP networks, parallel networks, tandem networks, open networks, closed networks, polling queues, fork-join networks and distributed networks. Also, the product-form solution, the quasi-reversibility and some approximation algorithms are the basic results in the study of queueing networks.

For the queueing networks, we refer readers to some excellent books such as Kelly [179], Van Dijk [310], Gelenbe et al. [138], Chao et al. [72], Serfozo [284], Chen and Yao [76], Balsamo et al. [27], Daduna [85], Bolch et al. [41] and Boucherie and Van Dijk (Eds.) [43].

For applications of queueing networks, readers may refer to some excellent books, for example, manufacturing systems by Buzacott and Shanthikumar [52], communication networks by Chang [71], traffic networks by Garavello and Piccoli [133], healthcare by Lakshmi and Iyer [206], service management by Demirkan et al. [92] and others.

(c) Queueing dynamic control

In 1967, Miller [242] and Ryokov and Lembert [277] are the first to apply the MDPs to consider dynamic control of queues and networks. Those two works opened a novel interesting research direction: MDPs in queues and networks.

For MDPs of queues and networks, we refer readers to three excellent books by Kitaev and Rykov [182], Sennott [282] and Stidham [298].

In MDPs of queues and networks, so far there have been some best survey papers, for instance, Crabill et al. [83, 84], Sobel [289], Stidham and Prabhu [299], Rykov [272, 274], Kumar [198], Stidham and Weber [300], Stidham [296] and Brouns [50].

For some Ph.D thesises by using MDPs of queues and networks, reader may refer to, such as, Farrell [114], Abdel-Gawad [1], Bartroli [28], Farrar [112], Veatch [314], Altman [7], Atan [25] and Efrosinin [107].

Now, MDPs of queues and networks play an important role in dynamic control of many practical stochastic networks, for example, inventory control [117, 116, 54], supply chain management [111], maintenance and quality [200, 95], manufacturing systems [172, 52], production lines [323], communication networks [11, 251, 6], wireless and mobile networks [4, 96], cloud service [301], healthcare [254], airport management [271, 211], energy-efficient management [262, 250] and artificial intelligence [188, 287]. With rapid development of Internet of Things (IoT), big data, cloud computing, blockchain and artificial intelligence, it is necessary to discuss MDPs of queues and networks under an intelligent environment.

From the detailed survey on MDPs of queues and networks, this paper suggests a future research under an intelligent environment from three different levels as follows:

Networks with several nodes: Analyzing MDPs of policy-based Markov processes with block structure, for example, QBD processes, Markov processes of GI/M/1 type, and Markov processes of M/G/1 type, and specifically, discussing their sensitivity-based optimization. 2. 2.

Networks with a lot of nodes: discussing MDPs of practical big networks, such as blockchain systems, sharing economics, intelligence healthcare and so forth. 3. 3.

Networks with a lot of clusters: studying MDPs of practical big networks by means of the mean-field theory, e.g., see Gast and Gaujal [134], Gast et al. [135] and Li [219].

The remainder of this paper is organized as follows. Sections 2 to 5 provide an overview for MDPs of single-server queues, multi-server queues, queueing networks, and queueing networks with special structures, respectively. Section 6 sets up specific objectives to provide an overview for key objectives in queueing dynamic control. Section 7 introduce the sensitivity-based optimization and the event-based optimization, both of which are applied to analyze MDPs of queues and networks. Finally, we give some concluding remarks in Section 8.

2 MDPs of Single-Server Queues

In this section, we provide an overview for MDPs of single-server queues, including the M/M/1 queues, the M/M/c queues, the M/G/1 queues, the GI/M/1 queues and others. In the early research on MDPs of queues and networks, the single-server queues have been an active topic for many years.

(1) MDPs of M/M/1 queues

Kofman and Lippman [187], Rue and Rosenshine [268, 269], Yeh and Thomas [338], Lu and Serfozo [231], Plum [258], Altman [18], Kitaev and Serfozo [183], Savaşaneril et al. [279] and Dimitrakopoulos and Burnetas [94].

(2) MDPs of M/G/1 queues

Mitchell [243], Doshi [100, 101], Gallisch [130], Rue and Rosenshine [270], Jo and Stidham [173], Mandelbaum and Yechiali [236], Kella [177], Wakuta [317], Altman and Nain [17], Feinberg and Kim [120], Feinberg and Kella [119] and Sanajian et al. [278].

(3) MDPs of GI/M/1 queues

Stidham [293] and Mendelson and Yechiali [239].

(4) MDPs of more genernal single-server queues

Stidham [293], Crabill [82], Lippman [227], Schassberger [280], Stidham [294], Hordijk and Spieksma [164], Federgruen and So [115], Lamond [207], Towsley et al. [307], Koole [190], Haviv and Puterman [153], Lewis et al. [216], George and Harrison [139], Johansen and Larsen [174], Piunovskiy [256], Stidham [297], Adusumilli and Hasenbein [3], Kumar et al. [199] and Yan et al. [335].

(5) MDPs of single-server batch queues

Deb and Serfozo [90], Deb [89] and Powell and Humblet [259] with batch services; and Nobel and Tijms [248] with batch arrivals.

(6) MDPs of single-server queues with either balking, reneging or abandonments

Blackburn [39] with balking, Down et al. [102] with reneging, and Legros [215] with abandonments.

(7) MDPs of single-server priority queues

Robinson [264], Browne and Yechiali [47], Groenevelt et al. [144] and Brouns and Van Der Wal [51].

(8) MDPs of single-server processor-sharing queues

De Waal [93], Altman et al. [15], Van der Weij et al. [309] and Bhulai et al. [38].

(9) MDPs of single-server retrial queues

Liang and Kulkarni [225], Winkler [322] and Giovanidis et al. [141].

(10) MDPs of single-server information-based queues

Kuri and Kumar [201, 202], Altman and Stidham [19] and Honhon and Seshadri [159].

(11) MDPs of single-server queues with multiple classes of customers

Harrison [151], Chen [73], Browne and Yechiali [49], De Serres [87, 88], Ata [24], Feinberg and Yang [123] and Larrañaga et al. [209].

(12) MDPs of single-server queues with optimal pricing

Low [229], Chen [73], Yoon and Lewis [340], Çelik and Maglaras [69], Economou and Kanta [106] and Yildirim and Hasenbein [339].

(13) MDPs of single-server manufacturing queues

*(a) The make-to-stock queues: *Savaşaneril et al. [279], Sanajian et al. [278], Perez and Zipkin [255], Jain [170] and Cao and Xie [54].

*(b) The make-to-order queues: *Besbes and Maglaras [36] and Çelik and Maglaras [69].

(c) The assemble-type queues: Nadar et al. [244].

(d) The inventory control queues: Veatch [314], Savaşaneril et al. [279], Federgruen and Zipkin [117], Federgruen and Zheng [116], Feinberg [Fed:2016], Feinberg and Liang [Fed:2017].

(14) MDPs of inventory rationing across multiple demand classes

Ha [146, 147, 148], Gayon et al. [137] and Li et al. [222].

3 MDPs of Multi-server Queues

In this section, we provide an overview for MDPs of multi-server queues, which are another important research direction.

(1) MDPs of M/M/c queues

Low [230], Anderson [20], Printezis and Burnetas [260] and Feinberg and Yang [124, 123].

(2) MDPs of GI/M/c queues

Yechiali [337], Van Nunen and Puterman [312] and Feinberg and Yang [125].

(3) MDPs of two-server queues

Larsen and Agrawala [208], Lin and Kumar [226], Hajek [149], Varma [313], Chen et al. [75] and Xu and Zhao [333].

(4) MDPs of multi-server queues

Emmons [108], Helm and Waldmann [154], Blanc et al. [40], Bradford [45], Koçaǧa and Ward [186] and Lee and Kulkarni [212].

(5) MDPs of heterogeneous server queues

Rosberg and Kermani [265], Nobel and Tijms [249], Rykov [273], Rykov and Efrosinin [275] and Tirdad et al. [306].

4 MDPs of Queueing Networks

In this section, we provide an overview for MDPs of queueing networks. Note that the MDPs of queueing networks have been an interesting research direction for many years, and they have also established key applications in many practical areas.

(1) MDPs of more general queueing networks

Ross [267], Weber and Stidham [320], Stidham [295], Shanthikumar and Yao [285], Veatch and Wein [315], Tassiulas and Ephremides[303], Papadimitriou and Tsitsiklis [252], Bäuerle [30], Bäuerle [31] and Solodyannikov [290].

(2) MDPs of queueing networks with multiple classes of customers

Shioyama [286], Bertsimas et al. [35], Maglaras [235], Chen and Meyn [78] and Cao and Xie [55].

(3) Queueing applications of Markov decision processes

Serfozo [283] studied the MDPs of birth-death processes and random walks, and then discussed dynamic control queueing networks. White [321] focused on the MDPs of QBD processes, which were used to deal with dynamic control of queueing networks. Robinson [263] and Hordijk et al. [163] studied the MDP which were applied to the study of queueing networks. Sennott [281] analyzed the semi-MDP and applied the obtained results to discuss the queueing networks.

Other key research includes Van Dijk and Puterman [311], Liu et al. [228], Altman et al. [12] and Adlakha et al. [2].

5 MDPs of Queueing Networks with Special Structure

In this section, we provide an overview for MDPs of queueing networks with special Structure, for example, multi-station tandem queues, multi-station parallel queues, polling queues, fork-join queues and so on.

(1) MDPs of two-station tandem queues

Ghoneim and Stidham [140], Nishimura [247], Farrar [113], Iravani et al. [167], Ahn et al. [5] and Zayas-Cabán et al. [341].

(2) MDPs of multi-station tandem queues

Rosberg et al. [266], Hordijk and Koole [160], Hariharan et al. [150], Gajrat et al. [129], Koole [192], Zhang and Ayhan [344] and Leeuwen and Núnez-Queija [213].

(3) MDPs of parallel queues

parallel queues by Weber [319], Bonomi [42], Menich and Serfozo [240], Xu et al. [332], Hordijk and Koole [161], Chen et al. [70], Sparaggis et al. [291], Koole [189], Ku and Jordan [197], Down and Lewis [103], Delasay et al. [91] and Feinberg and Zhang [126].

(4) MDPs of polling queues

Browne and Yechiali [48], Gandhi and Cassandras [131], Koole and Nain [195] and Gaujal et al. [136].

(5) MDPs of fork-Join queueing networks

Pascual et al. [253], Zeng et al. [342], Marin and Rossi [237] and Zeng et al. [343].

(6) MDPs of Call Centers

Koole [194], Bhulai [37], Legros et al. [214], Gans et al. [132] and Koole and Mandelbaum [191].

(7) MDPs of distributed queueing networks

Chou and Abraham [79], e Silva and Gerla [288], Franken and Haverkort [128], Li and Kameda [217], Nadar et al. [244] and Vercraene et al. [316].

(8) Competitive MDPs of distributed queueing networks

The competitive MDPs are called to be stochastic games. Altman and Hordijk [13] studied the zero-sum Markov game and applied the obtained results to the worst-case optimal control of queueing networks. Altman [8] studied non-zero stochastic games and applied their results to admission, service and routing control in queueing networks. Altman [10] proposed a Markov game approach for analyzing the optimal routing of a queueing network. Hordijk et al. [162] studied a multi-chain stochastic game which was applied to the worst case admission control in a queueing network. Xu and Hajek [334] studied the game problem of supermarket models. Xia [324] applied the stochastic games to analyzing the service rate control of a closed queueing network.

**(9) Heavy traffic analysis for controlled queues and networks **

Heavy traffic analysis can be used to deal with a class of important problems of controlled queues and networks by means of fluid and diffusion approximation. Readers may refer to, for example, Kushner [203], Kushner and Ramachandran [205], Kushner and Martins [204]; Harrison [152], Plambeck et al. [257]; Chen and Yao [76], Atar et al. [26].

6 Key Objectives in Queueing Dynamic Control

In this section, we introduce some key objectives to classify the literature of queueing dynamic control, for example, input control, service control, dynamic control under different service mechanisms, dynamic control with pricing, threshold control and so forth.

Objective one: Input control

The input control is to apply the MDPs to dynamically control the input process of customers in the queues and networks, including the input rate control, the interval time control, and the admission access control (e.g., probability that an arriving customer chooses entering the system or some servers).

*(a) The input rate control: *Kitaev and Rykov [182], Sennott [282], Crabill et al. [84], Stidham and Weber [300], Crabill [82] and Lee and Kulkarni [212].

(b) The input process control: Kitaev and Rykov [182], Sennott [282], Crabill et al. [84], Stidham and Weber [300], Abdel-Gawad [1], Doshi [100], Stidham [293], Piunovskiy [256], Kuri and Kumar [201], Kuri and Kumar [202], Van Nunen and Puterman [312], Helm and Waldmann [154], Ghoneim and Stidham [140] and Nishimura [247].

*(c) The admission access control: *Crabill et al. [83, 84], Stidham and Weber [300], Brouns [50], Rue and Rosenshine [268, 269, 270], Dimitrakopoulos and Burnetas [94], Mandelbaum and Yechiali [236], Mendelson and Yechiali [239], Stidham [294], Hordijk and Spieksma [164], Lamond [207], Lewis et al. [216], Adusumilli and Hasenbein [3], Altman et al. [15], Honhon and Seshadri [159], Yoon and Lewis [340], Yildirim and Hasenbein [339], Anderson [20], Emmons [108], Blanc et al. [40], Koçaǧa and Ward [186], Zhang and Ayhan [344], Altman [8], Hordijk et al. [162] and Xia [324].

Objective two: Service control

The service control is to use the MDPs to dynamically control the service process in queues and networks, including the service rate control, the service time control, and the service process control.

*(a) The service rate control: *Kitaev and Rykov [182], Sennott [282], Stidham [296, 298], Crabill et al. [83, 84], Stidham and Weber [300], Yao and Schechner [336], Dimitrakopoulos and Burnetas [94], Mitchell [243], Doshi [101], Jo and Stidham [173], Adusumilli and Hasenbein [3], Kumar et al. [199], Anderson [20], Lee and Kulkarni [212], Weber and Stidham [320], Ma and Cao [232], Xia [324], Xia and Shihada [331] and Xia and Jia [329].

*(b) The service time control: *Gallisch [130].

*(c) The service process control: *Kitaev and Rykov [182], Sennott [282], Stidham [298], Crabill et al. [83, 84], Stidham and Weber [300], Schassberger [280], Johansen and Larsen [174], Stidham [297], Nishimura [247], Rosberg et al. [266], Altman [8] and Hordijk et al. [162].

Objective three: Dynamic control under different service mechanisms

Many practical and real problems lead to introduction of different service mechanisms which make some interesting queueing systems, for example, priority queues, processor-sharing queues, retrial queues, vacation queues, repairable queues, fluid queues and so on.

*(a) The priority queueing control: *The priority is an important service mechanism, and it is a precondition that sets up useful relations among key customers, segmenting market and adhering to long-term cooperation. Note that the priority makes dynamic control of queues with multi-class customers. Readers may refer to Rykov and Lembert [277], Crabill et al. [83, 84], Stidham and Weber [300], Kofman and Lippman [187], Robinson [264], Browne and Yechiali[47], Groenevelt et al. [144], Brouns and Van Der Wal [51], Jain [170], Printezis and Burnetas[260] and Koole and Nain [195].

*(b) The processor-sharing queueing control: *Crabill et al. [83, 84], Stidham and Weber [300], De Waal [93], Altman et al. [15], Van der Weij et al. [309], Bhulai et al. [38] and Bonomi [42].

*(c) The retrial queueing control: *Bhulai et al. [38], Liang and Kulkarni [225], Winkler [322] and Giovanidis et al. [141].

*(d) The vacation queueing control: *Li et al. [220], Altman and Nain [17, 18], Kella [177] and Federgruen and So [115].

*(e) The repairable queueing control: *Dimitrakos and Kyriakidis [95], Rykov and Efrosinin [276], Tijms and van der Duyn Schouten [305].

*(f) The removable server control: *For dynamic control of working servers, it is necessary to real-time response to the peak period or an emergency phenomenon through increasing or decreasing the number of working servers according to either customer number or system workload. We refer the readers to Feinberg and Kim [120], Feinberg and Kella [119] and Iravani et al. [167].

(g) The dynamic control of queueing behavior: blocking by Blackburn [39] and Economou and Kanta [106]; reneging and impatience by Li et al. [220] and Anderson [20]; and abandonment by Down et al. [102], Legros et al. [215], Larrañaga et al. [209] and Zayas-Cabán et al. [341].

Objective four: Threshold control

In dynamic control of queues and networks, the threshold-type policy is a simple and effective mode, including single-threshold and dual-threshold.

(a) The single-threshold policy: Brouns [50], Altman and Nain [18], Federgruen and So [115], Brouns and Van Der Wal [51];

(b) The dual-threshold policy: Lu and Serfozo [231], Plum [258] and Kitaev and Serfozo [183].

Objective five: Optimal routing control

(a) The entering parallel-server policy: Rosberg et al. [266], Weber [319], Bonomi [42], Menich and Serfozo [240], Xu et al. [332], Hordijk and Koole [161], Chang et al. [70], Sparaggis et al. [291], Koole [189], Ku and Jordan [197], Down and Lewis [103], Delasay et al. [91] and Li and Kameda [217].

(b) The routing policy: Abdel-Gawad [1], Altman [7, 9], Towsley et al. [307], Liang and Kulkarni [225], Xu and Zhao [333], Bradford [45], Rosberg and Kermani [265], Ross [267], Stidham [295], Tassiulas and Ephremides [303], Menich and Serfozo [240], Koole [189], Browne and Yechiali [47], Altman and Nain [18] and Ho and Cao [157].

(c) The assignment policy: Weber [319], Bonomi [42] and Xu et al. [332].

Objective six: Controlled queues and networks with useful information

In the queueing networks, the useful information plays a key role in dynamic control of queueing networks. Readers may refer to Kuri and Kumar [201], Altman and Stidham [19], Honhon and Seshadri [159], Altman et al. [16], Altman and Jiménez [14] and Rosberg and Kermani [265].

Load balancing is an interesting research direction in queueing networks with simply observable information, e.g., see Down and Lewis [103], Chou and Abraham [79], e Silva and Gerla [288], Li and Kameda [217], Li et al. [220, 221], Li [219] and Li and Lui [224].

Objective seven: Controlled queues and networks with pricing

The optimal pricing policy is an important research direction in dynamic control of queues and networks, e.g., see literature Low [229], Chen and Frank [74], Yoon and Lewis [340], Çelik and Maglaras [69], Economou and Kanta [106], Yildirim and Hasenbein [339], Feinberg and Yang [125], Bradford [45], Xia and Chen [327] and Federgruen and Zheng [116].

7 Sensitivity-Based Optimization for MDPs of Queueing Networks

In this section, we simple introduce the sensitivity-based optimization in the MDPs, and then provide an overview on how to apply the sensitivity-based optimization in dynamic control of queues and networks.

In the late 1980s, to study dynamic control of queueing systems, Professors Yu-Chi Ho and Xi-Ren Cao proposed and developed the infinitesimal perturbation method for discrete event dynamic systems (DEDS), which is a new research direction for online simulation optimization of the DEDS. See [158] for more interpretation. Further excellent books include Glasserman[142], Cao [56] and Cassandras and Lafortune [68].

**Sensitivity-Based Optimization: **Cao et al. [66] and Cao and Chen [65] published a pioneer work that transforms the infinitesimal perturbation of DEDS, together with the MDPs, into the so-called sensitivity-based optimization by means of the policy-based Markov processes and the associated Poisson equations, in which they also developed new concepts, for example, performance potential, and performance difference equation. On this research line, Cao [58] summarized many basic results of the sensitivity-based optimization. In addition, Li and Liu [223] and Chapter 11 in Li [218] extended and generalized the sensitivity-based optimization to a more general perturbed Markov process with infinite states by means of the RG-factorizations.

So far some work has applied the sensitivity-based optimization to deal with MDPs of queues and networks, e.g., see Xia and Cao [326], Xia and Shihada [331], Xia [324], Xia and Jia [329], Xia et al. [328] and Xia and Chen [327]; Ma et al. [233, 234] for data centers; and Li et al. [222] for inventory rationing control. It is worthwhile to note that the sensitivity-based optimization of queues and networks can be effectively supported and developed by means of the matrix-analytic method by Neuts [245, 246] and the RG-factorizations by Li [218]. Also see Ma et al. [233, 234] and Li et al. [222] for more details.

Recently, Xi-Ren Cao further extended and generalized the sensitivity-based optimization to the more general case of diffusion processes, called relative optimization of continuous-time and continuous-state stochastic systems (see Cao [62] with a complete draft). Important examples include Cao [59, 60, 61, 63, 64] and references therein.

**Event-Based Optimization Approach: **In many practical systems, an event usually has a specific physical meaning and can mathematically correspond to a set of state transitions with the same characteristics. In general, the number of events from change of system states is much smaller than the state number of the system. Therefore, such an event can be used to describe an approximate MDP, hence this sets up a new optimal framework, called event-based optimization. The event-based optimization can directly capture the future information and the structure nature of the system, which are reflected in the event to aggregate performance potential. Note only can the event-based optimization greatly save the calculation, but also it alleviates the dimensional disaster of a network decision process.

For the event-based optimization, readers may refer to, for example, dynamic control of queueing systems by Koole [190] and Koole [195]; dynamic control of Markov systems by Cao [57], Cao [53], Xia [330] and Jia [171]; partially observable Markov decision processes by Wang and Cao [318]; and admission control of open queueing networks by Xia [325].

8 Concluding Remarks

In this survey, we provide an overview for the MDPs of queues and networks, including single-server queues, multi-server queues and queueing networks. At the same time, the overview is also related to some specific objectives, for example, input control, service control, dynamic control based on different service mechanisms, dynamic control based on pricing, threshold control and so on.

Along such a line, there are still a number of interesting directions for potential future research, for example:

$\bullet$ Developing effective and efficient algorithms to find the optimal polices and to compute the optimal performance measures, and also probably linking AI and learning algorithms;

$\bullet$ discussing structure properties of the optimal policy in the MDPs of queueing networks under intelligent environment (for example, IoT, big data, cloud service, blockchain and AI), and specifically, dealing with multi-dimensional queueing dynamic control;

$\bullet$ analyzing structure properties of the optimal policy in the MDPs with either QBD processes, Markov processes of GI/M/1 type or Markov processes of M/G/1 type, which are well related to various practical stochastic models.

$\bullet$ applying the sensitivity-based optimization and the event-based optimization to deal with dynamic control of practical stochastic networks, for example, production and inventory control, manufacturing control, transportation networks, healthcare, sharing economics, cloud service, blockchain, service management, energy-efficient management and so forth.

Acknowledgements

Quan-Lin Li was supported by the National Natural Science Foundation of China under grant No. 71671158 and by the Natural Science Foundation of Hebei province under grant No. G2017203277. Li Xia was supported by the National Natural Science Foundation of China under grant No. 61573206. The authors thank X.R. Cao and E.A. Feinberg for their valuable comments and suggestions to improve the presentation of this paper.

Bibliography344

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E.F. Abdel-Gawad (1984). Optimal control of arrivals and routing in a network of queues. Ph.D. dissertation, North Carolina State University.
2[2] S. Adlakha, S. Lall and A. Goldsmith (2012). Networked Markov decision processes with delays. IEEE Transactions on Automatic Control, 57(4), 1013–1018.
3[3] K.M. Adusumilli and J.J. Hasenbein (2010). Dynamic admission and service rate control of a queue. Queueing Systems, 66(2), 131–154.
4[4] M.H. Ahmed (2005). Call admission control in wireless networks: a comprehensive survey. IEEE Communications Surveys & Tutorials, 7(1), 49–68.
5[5] H.S. Ahn, I. Duenyas and M.E. Lewis (2002). Optimal control of a two-stage tandem queuing system with flexible servers. Probability in the Engineering and Informational Sciences, 16(4), 453–469.
6[6] M.A. Alsheikh, D.T. Hoang, D. Niyato, H.P. Tan and S. Lin (2015). Markov decision processes with applications in wireless sensor networks: A survey. ar Xiv preprint ar Xiv:1501.00644, Pages 1–29.
7[7] E. Altman (1994). A Markov game approach for optimal routing into a queuing network. Ph.D. dissertation, INRIA (Institut National de Recherche en Informatique et en Automatique).
8[8] E. Altman (1996). Non zero-sum stochastic games in admission, service and routing control in queueing systems. Queueing Systems, 23(1-4), 259–279.