Large deviations of the long term distribution of a non Markov process
Anatolii A. Puhalskii

TL;DR
This paper establishes a large deviation principle for the long-term queue length distribution in ergodic generalized Jackson networks, linking it to the quasipotential and idempotent probability theory.
Contribution
It introduces a novel connection between large deviations, quasipotential, and idempotent distributions in queueing networks.
Findings
Long-term queue distribution obeys the Large Deviation Principle.
The deviation function is given by the quasipotential.
The quasipotential relates to the unique long-term idempotent distribution.
Abstract
We prove that the long term distribution of the queue length process in an ergodic generalised Jackson network obeys the Large Deviation Principle with a deviation function given by the quasipotential. The latter is related to the unique long term idempotent distribution, which is also a stationary idempotent distribution, of the large deviation limit of the queue length processes. The proof draws on developments in queueing network stability and idempotent probability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Large deviations of the long term distribution of
a non Markov process
Anatolii A. Puhalskii 111Email: [email protected]
Institute for Problems in Information Transmission
Abstract
We prove that the long term distribution of the queue length process in an ergodic generalised Jackson network obeys the Large Deviation Principle with a deviation function given by the quasipotential. The latter is related to the unique long term idempotent distribution, which is also a stationary idempotent distribution, of the large deviation limit of the queue length process. The proof draws on developments in queueing network stability and idempotent probability.
1 Introduction and summary
In a seminal contribution, Freidlin and Wentzell [5] obtained the Large Deviation Principle (LDP) for the stationary distribution of a diffusion process and showed that the deviation function, which is often referred to as the action functional or the (tight) rate function, is given by the quasipotential. Their ingenious analysis relied heavily on the strong Markov property and involved an intricate study of attainment times. Shwartz and Weiss [10] adapted the methods of Freidlin and Wentzell [5] to the setting of jump Markov processes. In Puhalskii [8], we suggested a different, arguably, more direct and, as we hope, more robust approach. It was prompted by the analogy between large deviations and weak convergence and sought to identify the deviation function in terms of the stationary idempotent distribution of a large deviation limit. In this paper, the approach is applied to establishing the LDP for the long term distribution of the non Markov process of queue lengths in a generalised Jackson network. It is noteworthy that, in addition to being non Markovian, generalised Jackson networks fall into the category of stochastic systems with discontinuous dynamics, whose analysis is generally more difficult. We show that the deviation function is still given by the quasipotential which is related to the stationary idempotent distribution of the limit idempotent process. That stationary idempotent distribution is also a unique long term idempotent distribution, the uniqueness being proved by a coupling argument. Geometric ergodicity of the queue length process enables us to conclude that the long term idempotent distribution is the large deviation limit of the long term queue length distributions.
2 The setup and main result
We consider a queueing network with a homogeneous customer population which comprises single server stations. Customers arrive exogenously at the stations and are served there in the order of arrival, one customer at a time. Upon being served, they either join a queue at another station or leave the network. Let denote the cumulative number of exogenous arrivals at station by time , let denote the cumulative number of customers that are served at station for the first units of busy time of that station, and let denote the cumulative number of customers among the first customers departing station that go directly to station . Let , , and , where and . It is assumed that the and are nonzero renewal processes and , where is a sequence of i.i.d. random variables assuming values in , standing for the indicator function of set . The random entities , , and are assumed to be defined on common probability space and be mutually independent, where . We denote and let . The matrix is assumed to be of spectral radius less than unity so that every arriving customer eventually leaves. Let represent the queue length process, where and represents the number of customers at station at time . All the stochastic processes are assumed to have piecewise constant right–continuous with left–hand limits trajectories. Accordingly, they are considered as random elements of the associated Skorohod spaces.
For and , the following equations are satisfied:
[TABLE]
where
[TABLE]
represents the number of departures from station by time and
[TABLE]
represents the cumulative busy time of station by time . For given realisations of , , , and , there exist unique , and that satisfy (2.1), (2.2) and (2.3), see, e.g., Chen and Mandelbaum [4]. The process is non Markov unless all and are Poisson processes.
Let, for , nonnegative random variables and represent generic times between exogenous arrivals and service times at station , respectively. We assume that and for some , and the cumulative distribution functions of the and are right–differentiable at [math] with positive derivatives. Let and . Let also if , , , , and . Let represent the set of row–substochastic –matrices and represent the –identity matrix. Given vectors and , matrix with rows , and , we define
[TABLE]
and
[TABLE]
where . Also, for , we let
[TABLE]
If is a nonempty subset of , we denote , is defined to be the interior of . Let, for and ,
[TABLE]
The function is seen to be nonnegative.
Let
[TABLE]
provided is absolutely continuous with and , otherwise, where .
With large deviations in mind, we will assume in the next theorem that the initial queue length depends on large parameter , so, superscript ”” will be used to denote the associated random quantities, e.g., is the queue length vector at time . Theorem 2.2 in Puhalskii [9] proves the following result.
Theorem 2.1**.**
If, in addition, as , for all , then the queue length processes obey the LDP in for rate with the deviation function .
For , we define the quasipotential by
[TABLE]
In order to address an LDP for the stationary queue length distribution, we assume that the network is subcritical:
[TABLE]
where , , and . (Inequalities between vectors or matrices are understood to hold entrywise.) In addition, we assume that
there exists number such that , for and , 2. 2.
, for and , 3. 3.
for , there exist nonnegative function on with and such that , provided , where are i.i.d. and are distributed as .
Under these hypotheses, the converge in distribution to random variable , as , see Down and Meyn [6]. The convergence holds for arbitrary initial vector and the convergence rate is geometric for the metric of total variation. In addition, if the random variables are augmented with residual service and interarrival times to produce a Markov process, then that Markov process has a unique stationary distribution, the distribution of being a marginal distribution. Our main result is the following theorem.
Theorem 2.2**.**
The sequence obeys the LDP in for rate with the deviation function .
Remark 2.1*.*
Under (2.8), there is no ”large deviation cost” for staying at the origin. On taking in (2.5) , , , , and and noting that by (2.8), one can see by (2.4) that , so and . More generally, when is ”a fluid limit queue length” or the trajectory of the law of large numbers, i.e., , where and , for , cf. Puhalskii [9]. The converse is also true: if , then the infimum in (2.5) is attained at , when and . (For a proof, one notes that , , and , if and only if , , and , respectively.) As a byproduct, in (2.7) can be replaced with .
3 Idempotent probability and the proof of Theorem 2.2
Let us recap some notions of idempotent probability, see, e.g., Puhalskii [7]. Let be a set. Function from the power set of to is called an idempotent probability if and . The pair is called an idempotent probability space. For economy of notation, we denote . Property pertaining to the elements of is said to hold -a.e. if , where, in accordance with a tradition of probability theory, we define . Function from set equipped with idempotent probability to set is called an idempotent variable. The idempotent distribution of the idempotent variable is defined as the set function . If is the canonical idempotent variable defined by , then it has as the idempotent distribution. If , with assuming values in , then the (marginal) distribution of is defined by . The idempotent variables and are said to be independent if for all , so, the joint distribution is the product of the marginal ones. Independence of finite collections of idempotent variables is defined similarly. Collection of idempotent variables on is called an idempotent process. The functions for various are called trajectories (or paths) of . Idempotent processes are said to be independent if they are independent as idempotent variables with values in the associated function spaces. The concepts of idempotent processes with independent and (or) stationary increments mimic those for stochatic processes.
If is, in addition, a metric space and the sets are compact for all , then is called a deviability. Obviously, is a deviability if and only if is a deviation function. If is a continuous mapping from to another metric space , then is a deviability on . As a matter of fact, for the latter property to hold, one can only require that be continuous on the sets for . In general, an idempotent variable is said to be Luzin if its idempotent distribution is a deviability.
Let be a sequence of probability measures on metric space endowed with the Borel -algebra and let be a deviability on . The sequence is said to large deviation converge (LD converge) at rate to as if \lim_{n\to\infty}\Bigl{(}\int_{\Upsilon}f(\upsilon)^{n}\,\mathbf{P}_{n}(d\upsilon)\Bigr{)}^{1/n}=\sup_{\upsilon\in\Upsilon}f(\upsilon)\mathbf{\Pi}(\upsilon) for every bounded continuous -valued function on . Equivalently, one may require that for every –continuity set , which is defined by the requirement that the values of on the interior and closure of are equal to each other. Obviously, the sequence LD converges at rate to if and only if this sequence obeys the LDP for rate with deviation function . Similarly, sequence of deviabilities on is said to converge weakly to deviability , as , if for every bounded continuous -valued function on . The analogue of Prohorov’s theorem holds: if the sequence is tight meaning that , where represents the collection of compact subsets of , then the converge to a deviability along a subsequence.
LD convergence of probability measures can be also expressed as LD convergence in distribution of the associated random variables to idempotent variables. We say that sequence of random variables defined on respective probability spaces and assuming values in LD converges in distribution at rate as to idempotent variable defined on idempotent probability space and assuming values in if the sequence of the probability laws of the LD converges to the idempotent distribution of at rate . If sequence of probability measures on LD converges to deviability on , then one has LD convergence in distribution for the canonical setting.
We now return to the setting of generalised Jackson networks and let . It is proved in Puhalskii [9] that under the hypotheses of Theorem 2.1 there exists unique deviability on such that the processes \bigl{(}(Q^{n}(nt)/n\,,t\in\mathbb{R}_{+}),(B^{n}(nt)/n\,,t\in\mathbb{R}_{+}),(D^{n}(nt)/n\,,t\in\mathbb{R}_{+}),(A^{n}(nt)/n\,,t\in\mathbb{R}_{+}),(S^{n}(nt)/n,\,t\in\mathbb{R}_{+}),(R^{n}(nt)/n\,,t\in\mathbb{R}_{+})\bigr{)} LD converge at rate to the canonical idempotent process on . The component idempotent processes of , , , , and have -a.e. absolutely continuous nondecreasing trajectories starting at [math], the component idempotent processes of grow not faster than at rate , and the component idempotent processes of have -a.e. absolutely continuous trajectories, the idempotent process has idempotent distribution , the idempotent processes , and are independent with respective idempotent distributions , and defined as follows, where, by virtue of our working in a canonical setting, identical pieces of notation are used for denoting idempotent processes and their sample trajectories:
[TABLE]
where , , , , , , , , , , , the functions , , and being absolutely continuous with , , , a.e., a.e., and a.e.
Also -a.e. the following equations hold for and :
[TABLE]
where , , , and . Equations (3.4) and (3.5) are obtained by taking large deviation limits in (2.1) and (2.2), respectively. It is noteworthy that since in (3.1), (3.2) and (3.3) the sample trajectories enter the deviabilities only through their derivatives, the idempotent processes , and have independent and stationary increments.
By (3.4), -a.e. In order to allow the initial value to have a nondegenerate idempotent distribution, we introduce
[TABLE]
where is a deviabiltiy on . One can see that is a deviability on . Obviously, and , , and are independent under . Also, the marginal idempotent distribution of is given by
[TABLE]
By (3.4), –a.e.,
[TABLE]
Let
[TABLE]
The definition implies the semigroup property that
[TABLE]
For , we will denote by the vector with unity entries whose dimension equals the number of elements in . For compactness of notation, we let .
Lemma 3.1**.**
Given and , there exists such that
[TABLE]
Proof.
By the maxitivity property that , for arbitrary collection of sets , it suffices to work with only. By the LD convergence in distribution of to and Lemma A.1 relegated to the appendix, whose assertion can be found in Appendix A of Bell and Williams [1], for some ,
[TABLE]
∎
Lemma 3.2**.**
Given bounded set and , there exists such that if and , then
Proof.
The proof proceeds by establishing, initially, that in the long run the idempotent processes , and ”with great deviability” stay close to the corresponding fluid trajectories , and , respectively. Then, drawing on the proof of the stability of fluid models of queueing networks in Bramson [2, 3], it is shown that owing to condition (2.8) the function decreases linearly with , provided is small enough, which implies that the function must attain [math] .
By (2.8), there exists such that and . (In the course of the proof, potentially smaller will be needed. Yet, there exists that satisfies all the requirements. Importantly, it depends neither on nor on .) By Lemma 3.1, there exists such that , , , , , and .
Let
[TABLE]
We have that and that on , provided , by (3.4),
[TABLE]
Let us show that there exists such that for all on . Intuitively, this is the case because otherwise some would be ”bounded” whereas can be arbitrarily great for great pushing past . Formally, assuming that , for all , let be such that , for all . If , for some , then, by (3.5), for . By (3.8), on , for , . Therefore, by (3.6), a.e. when , so , which contradicts the assumption that . It is worth noting that whereas both and may depend on either or , neither of them depends on .
We now assume that is piecewise linear, which assumption is to be disposed of later. Let us suppose that for in a right neighborhood of for some on . Then, , for , until hits zero. Accordingly, . Hence, if entrywise in a right neighborhood of , then
[TABLE]
where we denote
[TABLE]
As a consequence, for some , which is dependent on only, while entrywise,
[TABLE]
By the righthand inequality in (3.10) and (3.11),
[TABLE]
By (3.12), there exist and such that, provided is small enough, if , then, while stays entrywise positive,
[TABLE]
Let us show that similar inequalities hold on for all . Given , let denote a possibly empty set of indices such that on some interval and if and . Such exists because is piecewise linear. We assume that on , so is a proper subset of . By the lefthand inequality in (3.10), on ,
[TABLE]
Therefore, using subscript and to denote restrictions of vectors to indices in and respectively, and using subscripts and to denote restrictions of matrices to entries with both indices in and , respectively, we have that
[TABLE]
so, assuming is small enough,
[TABLE]
On the other hand, by (3.11), Substitution in (3.16) and rearranging yield
[TABLE]
In analogy with the derivation of (3.12), one obtains that, for some ,
[TABLE]
Therefore, for ,
[TABLE]
Since , by (3.17), (3.18) and the bound when , there exist and which do not depend on such that, assuming is small enough, for ,
[TABLE]
By (3.13), we obtain that (3.14) still holds, for suitable which does not depend on and , provided is small enough. We can repeat the same argument over and over again, so, (3.14) holds until . Hence, one can take
[TABLE]
as the time by which is bound to hit the origin.
Suppose now that is not necessarily piecewise linear and . By Lemmas 4.1–4.4 in Puhalskii [9], there exist piecewise linear which converge to as such that . By what’s been proved, there exist from such that . Since , it follows that where represents a subsequential limit of the . ∎
Theorem 3.1**.**
There exists deviability on such that, for every bounded set ,
[TABLE]
Furthermore, given , for all great enough and all . The deviability is a unique stationary deviability for the semigroup meaning that, for all and ,
[TABLE]
Proof.
One can see that is a nondecreasing function of . Indeed, let . Given function such that and , we can associate with it function such that for and for . It follows that , so yielding the desired monotonicity. We let
[TABLE]
Let us show that levels off eventually as a function of . Let . We define as in the statement of Lemma 3.2 with as set . Suppose that , where . Let be such that , and . Let . Then and . By Lemma 3.2, there exists such that . On defining , we have that . On the other hand, since , we have that which implies that , so, on , for Remark 2.1 implies that if and for some , then if and only if on . Hence, , so, . This proves that if and , then . We also have that , for all and . Hence, the net of deviabilities is tight, so, is a deviability too.
Let us prove that
[TABLE]
A coupling argument is employed. We prove, at first, that, for arbitrary ,
[TABLE]
By Lemma 3.2, there exists such that if and , then for some . Let us fix and . One can assume that and that . Let trajectory be such that , and . By Lemma 3.2, there exists such that . We define by letting when and when . By Remark 2.1, proving (3.22).
On the other hand, given , , and such that , , , and for all (the latter can be always assumed as we have seen), we define with by letting it follow the law of large numbers until it hits zero at some and by letting , for . Since by Remark 2.1 , we obtain that , which concludes the proof of (3.21).
We have shown that , as , uniformly over and over from bounded sets. It follows that, for arbitrary initial deviability ,
[TABLE]
Letting in (3.9) implies that is a unique stationary initial deviability. (For, if is another stationary deviability, then , where , and one can let .)
∎
Remark 3.1*.*
The proof shows that the value of where the level off can be chosen uniformly over such that .
Proof of Theorem 2.2.
Let denote the distribution of and let denote the distribution of for . Let be a –continuity set. We have that
[TABLE]
By Theorem 4.1 in Down and Meyn [6], there exist and such that Given , let be such that and . Since, by Theorem 2.1, for all great enough, , it follows that , for all great enough. (Alternatively, one may let and then let in (3.23).) Finally, by (2.7) and (3.20). ∎
Remark 3.2*.*
Since , as , one can see by (3.23), that, more generally, geometric ergodicity of , as , for the metric of total variation and a sample path LDP for with , imply an LDP for .
Appendix A Appendix
Lemma A.1**.**
Let be a renewal process with rate . Suppose that certain exponential moments of the inter-renewal times are finite. Then, given arbitrary , there exists such that, for all ,
[TABLE]
Proof.
Let denote the successive inter-renewal times. For suitable ,
[TABLE]
Hence,
[TABLE]
Since , the latter righthand side is less than unity for small enough. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S.L. Bell and R.J. Williams. Dynamic scheduling of a system with two parallel servers in heavy traffic with resource pooling: asymptotic optimality of a threshold policy. Ann. Appl. Probab. , 11(3):608–649, 2001.
- 2[2] M. Bramson. Stability of queueing networks , volume 1950 of Lecture Notes in Mathematics . Springer, Berlin, 2008. Lectures from the 36th Probability Summer School held in Saint-Flour, July 2–15, 2006.
- 3[3] M. Bramson. Stability of queueing networks. Probab. Surv. , 5:169–345, 2008.
- 4[4] H. Chen and A. Mandelbaum. Discrete flow networks: bottleneck analysis and fluid approximations. Math. Oper. Res. , 16(2):408–446, 1991.
- 5[5] M.I. Freidlin and A.D. Wentzell. Random Perturbations of Dynamical Systems . Nauka, 1979. In Russian, English translation: Springer, 1984.
- 6[6] S. P. Meyn and D. Down. Stability of generalized Jackson networks. Ann. Appl. Probab. , 4(1):124–148, 1994.
- 7[7] A. Puhalskii. Large Deviations and Idempotent Probability . Chapman & Hall/CRC, 2001.
- 8[8] A. Puhalskii. On large deviation convergence of invariant measures. J. Theoret. Probab. , 16(3):689–724, 2003.
