A Stochastic Process on a Network with Connections to Laplacian Systems of Equations
Iqra Altaf Gillani, Amitabha Bagchi, Pooja Vyavahare

TL;DR
This paper analyzes a queueing network model for multi-hop sensor data collection, revealing critical data rates that determine system stability and connecting the process to Laplacian systems relevant for distributed algorithms.
Contribution
It introduces a stochastic process model for sensor networks, establishes a phase transition based on data rate, and links the process to Laplacian systems for computational applications.
Findings
Existence of a critical data rate separating ergodic and non-ergodic regimes.
Geometric convergence to stationarity in the sub-critical regime.
Connections to Laplacian systems for efficient distributed algorithms.
Abstract
We study an open discrete-time queueing network that models the collection of data in a multi-hop sensor network. We assume data is generated at the sensor nodes as a discrete-time Bernoulli process. All nodes in the network maintain a queue and relay data, which is to be finally collected by a designated sink. We prove that the resulting multi-dimensional Markov chain representing the queue size of nodes has two behavior regimes depending on the value of the rate of data generation. In particular, we show that there is a non-trivial critical value of data rate below which the chain is ergodic and converges to a stationary distribution and above which it is non-ergodic, i.e., the queues at the nodes grow in an unbounded manner. We show that the rate of convergence to stationarity is geometric in the sub-critical regime. We also show the connections of this process to a class of…
| Graph | Exact rate | |
|---|---|---|
| Cycle | ||
| Star Graph with sink at centre | ||
| and as self loop probability at each node | ||
| Star Graph with sink and source | ||
| at outer node | ||
| Complete graph | ||
| Random Geometric Graph | - | |
| Wheel Graph with sink at centre | ||
| and source at one of the cycle vertices | ||
| Wheel Graph with source at centre | ||
| and sink at one of the cycle vertices | ||
| Complete Binary tree with both | ||
| source and sink at leaves | ||
| -times star of star graph | ||
| with both source and sink at leaves | ||
| -times star of star graph | ||
| with source at center and sink at leaf |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Queuing Theory Analysis · Simulation Techniques and Applications · Distributed systems and fault tolerance
A Stochastic Process on a Network with Connections to Laplacian Systems of Equations
Iqra Altaf Gillani
{iqraaltaf,bagchi}@cse.iitd.ac.in
Department of Computer Science and Engineering, IIT Delhi
Amitabha Bagchi
{iqraaltaf,bagchi}@cse.iitd.ac.in
Department of Computer Science and Engineering, IIT Delhi
Pooja Vyavahare
Department of Electrical Engineering, IIT Tirupati
Abstract
We study an open discrete-time queueing network that models the collection of data in a multi-hop sensor network. We assume data is generated at the sensor nodes as a discrete-time Bernoulli process. All nodes in the network maintain a queue and relay data, which is to be finally collected by a designated sink. We prove that the resulting multi-dimensional Markov chain representing the queue size of nodes has two behavior regimes depending on the value of the rate of data generation. In particular, we show that there is a non-trivial critical value of data rate below which the chain is ergodic and converges to a stationary distribution and above which it is non-ergodic, i.e., the queues at the nodes grow in an unbounded manner. We show that the rate of convergence to stationarity is geometric in the sub-critical regime. We also show the connections of this process to a class of Laplacian systems of equations whose solutions include the important problem of finding the effective resistance between two nodes, a subroutine that has been widely used to develop efficient algorithms for a number of computational problems. Hence our work provides the theoretical basis for a new class of distributed algorithms for these problems.
Keywords: Ergodicity; Geometric Ergodicity; Random walks; Queueing networks; Stationary distribution
1 Introduction
We study a stochastic process arising from a natural routing and scheduling scheme used to collect data from sensor nodes over multi-hop relay networks [9, 10, 1]. We model the sensor network as a graph, some of whose vertices produce data packets according to a discrete-time Bernoulli process. One node is designated as a sink that has to collect the data generated in the network and all other nodes relay the data. Each node maintains a queue and relays at most one packet in a time slot in the manner of the “gossip” models widely studied in the networking and distributed computing literature [12][3][24]. The packet is relayed to a random neighbor, in the manner of a random walk on the graph. Our model is, therefore, an open discrete-time queueing network whose interconnections are described by an undirected (simple) graph.
Due to the relationship with the data collection task we call our process the Data Collection Process defined on a graph equipped with a positive edge-weight function . The process takes two parameters, a relative rate vector and a rate ; we assume that node produces a packet with probability in a given time slot. For a given relative rate vector, the process has two behavior regimes and undergoes a sharp transition between these two regimes, the controlling parameter being the rate . Specifically, we will show that for a critical value we have that when the process is non-ergodic, and the size of the queues grows to infinity, whereas, when process is ergodic such that all queues are almost surely finite and the system converges to a stationary distribution. For this latter regime, we also show that the rate of convergence is geometric, i.e., the Data Collection Process is geometrically ergodic.
For the process also has an unexpected connection with a subclass of systems of linear equations, which we refer to as “one-sink” Laplacian systems. The importance of this subclass comes from the fact that the effective resistance between a pair of nodes in a network can be computed by solving a one-sink Laplacian system [26] [23]. Over the last few years this connection to effective resistance has been repeatedly exploited to develop state-of-the-art algorithms for computing max flows in networks [5][2], random spanning trees of graphs [11][23], graph sparsification [26][13], and expander generation [6]. The connection of our Data Collection Process to one-sink Laplacian systems opens up a new direction for the design of efficient distributed algorithms computing an array of important structures and quantities on graphs as we have shown in [8]. However, efficient algorithms based on the Data Collection Process depend on fundamental mathematical properties of the process. Specifically, a stationary distribution must exist, and convergence towards it must be guaranteed in a reasonable time. This paper addresses those needs.
The rest of the paper is organized as follows. In Section 2, we discuss our main results. In Section 3, we prove existence of a non-trivial critical data rate below which the process is ergodic and above which it is non-ergodic. Then, in Section 4 we characterize this rate in terms of underlying graph parameters. In Section 5, we prove that the process is not only ergodic but geometrically ergodic and find the rate of convergence of the associated Markov chain to its stationary distribution. Finally, we conclude and give some directions for future work in Section 6.
2 Main results
2.1 Our model: The Data Collection Process
We consider a stochastic process on a network modeled by an undirected graph , where is the set of nodes, is the set of edges such that , and a positive weight function . We say that if and The generalized degree of node is defined as . We denote the maximum and minimum generalized degree among all nodes in the network by and respectively.
We consider time to be discrete and define the process in terms of the generation, movement and disappearance of “packets” from the system. In order to do this we are given a relative rate vector with the properties that (i) for exactly one node and (ii) . The node for which is called the sink and we will use to denote it hereafter. We also define a set of source nodes: . We are also given a rate parameter such that . We assume that each node in is equipped with a queue. The number of packets in the queue at at time is denoted by .
Packets appear in the system at the source nodes which receive external packet arrivals as an independent Bernoulli process with rate . The packet received externally is placed in the queue at . Packet movement at time takes place as follows: For each , if a single data packet is picked at random from the queue and sent to with probability . So, each node sends at most one packet from its queue in one time step and may receive multiple packets, up to one from each neighbour. A packet is removed from the system when a neighbor of decides to transmit that packet to .
In the following we will refer to the -dimensional Markov chain as the Data Collection Process on with relative rate vector and rate parameter . Mostly we will omit from the superscript since it will be understood. Occasionally we will maintain in the superscript but dispense with it when it is understood.
2.2 Ergodicity is a critical phenomenon for the Data Collection process
The Data Collection process has two distinct regimes, one ergodic and one non-ergodic, as we vary and there is a sharp transition between them. We find that there is a non-trivial such that the chain is ergodic for below this value and converges to a stationary distribution. Above the system displays drift and the queue sizes grow unbounded as . Specifically we show the following theorem:
Theorem 1**.**
Consider a weighted undirected graph and a relative rate vector with for exactly one . If the random walk on with transition matrix where is irreducible and aperiodic then there exists a such that the resulting multi-dimensional Markov chain is ergodic for all and non-ergodic for all .
Although it is difficult to prove ergodicity results for multi-dimensional Markov chains in general, we show in Section 3 how the induction-based technique developed by Georgiadis and Szpankowski [7], and later summarized by Szpankowski in his study of slotted ALOHA [28], can be applied to prove this result.
2.3 A lower bound on the critical rate
When the Data Collection process is ergodic and has a stationary distribution so we can define for all . We will show in Section 4 that at stationarity the vector extended to by setting is a solution a linear system because
[TABLE]
where is the transition matrix of the random walk defined on by the weight function . In Section 4.1 we discuss the relationship of this system to the Laplacian of and the implications of this relationship. For now, we state one important consequence of this relationship: a lower bound on .
Theorem 2**.**
Suppose we have a Data Collection process with relative rate vector such that only for , defined on a graph that satisfies the conditions of Theorem 1 and has critical rate . Then if is the transition matrix of the random walk defined by on and is the second largest eigenvalue of then
[TABLE]
2.4 Geometric Ergodicity
We show that when the Data Collection process converges to its stationary distribution at a geometric rate, i.e., the process is geometrically ergodic. Following Meyn and Tweedie [22], we define geometric ergodicity formally:
Definition 1** (Geometric ergodicity).**
Given an irreducible and aperiodic Markov chain defined on state space with transition probability and stationary distribution , the chain is said to be geometrically ergodic if there exist constants , , and, for every state there exists a , such that for all ,
[TABLE]
We use the coupling method to prove that convergence happens at a geometric rate. The convergence rate is in terms of the hitting time, , of the random walk defined on so we provide a definition of this quantity. If is a random walk on and , then i.e., the maximum over all pairs of vertices of the expected time taken for a random walk begun at to first reach the vertex . We show the following convergence theorem.
Theorem 3**.**
Consider defined on such that there is a critical as described in Theorem 1. Let for and denote by the transition matrix for the resulting multi-dimensional Markov Chain. Suppose we have with . Then
[TABLE]
Convergence to stationarity can be derived as a special case of Theorem 3 by choosing according to the , the stationary distribution of chain . This establishes the geometric ergodicity of the Data Collection process in the subcritical regime.
Corollary 1**.**
Consider the multi-dimensional Markov chain with for as defined in Theorem 3 and denote its stationary distribution by . For such that ,
[TABLE]
Moreover, for the special case that , i.e., the system begins with empty queues, the Markov chain mixes to within of its stationary distribution in terms of total variation distance for any parameter in time that is .
3 Ergodicity as a critical phenomenon
In this section, we prove existence of a non-trivial critical data rate for the multi-dimensional Markov chain associated with the Data Collection process such that the chain is ergodic for all values below and non-ergodic above it.
For a given a Data Collection process on a network modeled by an undirected graph , there is an associated -dimensional vector where each represents the queue size at a given node given a data rate . Since the Data Collection process is a queueing system, the question of stability arises, i.e., we need to understand whether the system is able to successfully transfer data at a given value which is the controlling parameter for the rate at which packets appear in the system. For this, following Loynes [18] and Szpankowski [28], we formally define a notion of a stable data rate as follows.
Definition 2** (Stable rate).**
Given a weighted undirected graph and a relative rate vector with for exactly one , the process is said to be stable and a value of the rate parameter is said to be a stable rate if
[TABLE]
where is the limiting distribution function.
However, if a weaker condition holds i.e.,
[TABLE]
the process is said to be substable and otherwise unstable. So, a stable process is necessarily substable and for a substable process to be stable its distribution function should tend to a limit. Thus, by stability we mean the distribution of as exists. Moreover, if the limiting distribution is a stationary distribution, then the process is ergodic. So, for queueing systems ergodicity and stability can be used interchangeably.
In general, proving ergodicity of multi-dimensional Markov chain is difficult, however, for the Markov chain corresponding to our stochastic Data Collection process we can easily prove it. This is because this process is part of a class of multi-queue systems for which Szpankowski and others showed a general method for proving the existence of a “stability region” of this kind [28]. Building on the work of Malyšev [20] on two-dimensional Markov chains as extended by Malyšev and Menšikov [21] to multi-dimensional chains, Georgiadis and Szpankowski developed an induction-based technique to characterize the stability region of the multi-queue system described by token passing rings [7]. After applying this technique to several related systems, Szpankowski noted in his study of slotted ALOHA [28] that all the systems amenable to this technique had certain properties. We will first discuss that general characterization (properties) and then show that the Data Collection process falls within it.
Given a multi-queue process with a set of queues. Let us consider a partition of , where refers to the set of persistent users which can transmit dummy packets even when their queues are empty and refers to the set of non-persistent users which behave as having normal queues. Now, for the given partition , let us define a modified multi-queue process wherein the queues in are never allowed to become empty and queues in behave similar to those in original process . To characterize the stability region of such processes like , Szpankowski’s induction-based technique requires three conditions to hold:
Monotonicity. The queues in the modified process are always longer due to the persistent users, i.e., . 2. 2.
Stationarity of . Since, the users in set mimic the original process, the transmissions from that enter should form a stationary and ergodic sequence so that Loynes’ scheme for one-dimensional queues [18] can be applied to establish the stationarity of a persistent queue (in order to perform the induction step). 3. 3.
Identical behaviors when non-empty. and behave identically as long as their queues are non-empty. Only when empties for some and is non-empty for , they behave differently.
Szpankowski’s general characterization is primarily based on an intrinsic coupling between the two processes and as indicated by the first and third property. In this coupling, starting from same initial state the transmission decisions are followed in the two processes i.e., if one process makes a transmission decision then the same decision is followed in the other, so that the trajectories of the two processes are coupled. Note that even if any queue in one of the processes is empty and the corresponding queue in the other is non-empty, any transmission decision of the latter will still be followed by the former although due to empty queue it will have no effect on its queue size or that of its neighbors. To show that the Data Collection process on graph also falls within the domain of this general characterization, we will also use different variations of this coupling for the corresponding Markov chain over space .
To start with, using coupling based argument we will first prove an interesting property about the Data Collection process and its corresponding multi-dimensional Markov chain which satisfies Szpankowski’s first condition about the monotonicity. In particular, we will show that for the Markov chain , the queue occupancy probability of a node is an increasing function of for all and it is continuous for all where is the critical rate above which the queues are unstable and below which they are stable.
Lemma 1**.**
Given an undirected graph running a Data Collection process. Let represent the queues at time for all nodes . Then, for all such nodes is
an increasing function of , and 2. 2.
continuous for all where is the critical data rate such that all data rates are stable and are unstable.
Proof.
(1). To prove this property, we will first establish that the multi-dimensional Markov chain is stochastically ordered i.e., stochastically larger initial states will produce stochastically larger chains at all times. For this, let us consider a coupling as used by Szpankowski of two trajectories of this chain and such that . Now, assume the stochastic dominance relation between the two holds at time i.e., . Then, at time step for both and from the one-step basic queue evolution equation at all nodes we have
[TABLE]
where is the number of packets generated at , which is 0 if and is 1 with probability if , so, . Now consider any node at time , from the induction hypothesis queues at node as well as its neighbors in will dominate over the ones in , so the first three terms on the right of Eq. (7) in will dominate the ones for and since is same, the last term is same for both cases. So, we have . This is true for all nodes , so we have at time , . Hence, by induction the Markov chain is stochastically ordered.
Now to prove monotonicity, for let us consider a coupling similar to the one used before of two stochastically ordered Markov chains and such that . Then, as we know for all , , so by using induction and evolving queues using one-step queue evolution equation (Eq. (7)), we can show that for all . Hence, by induction we have is an increasing function of for all .
(2). To prove the continuity of the given function for , we will again consider a similar coupling, however between two stochastically ordered Markov chains and with infinitesimal . For the data generation rule in the two chains, we have whenever new data packet is generated at any node in chain then, it is definitely generated at the corresponding node in chain but not vice-versa. To understand the difference in the two chains, let and denote the total number of packets in the respective chains till time and . Now, consider to be a function dependent on such that which is bounded by definition. So, if we look at the derivative of this function, the term where will be zero by definition of coupling, as the two chains behave differently only when there is an extra generated packet. Similarly, terms with will have higher powers of which will become zero as . Hence, the derivative only depends on term i.e.,
[TABLE]
where is the set of data sources. So, the total number of data packets generated in the two Markov chains upto time differ by one and hence, the queues at nodes in the two chains differ by at most one data packet at any time step. Now, for the given coupled chains let be the time by which an extra packet is generated in chain . So, we have,
[TABLE]
where is the probability that the extra packet generated in chain is present at node . This means
[TABLE]
So, if is defined, as, from the above equation we have, . Similarly, for the other side if is defined, so as , similar to Eq.(9) we have, . Now, if both these conditions are true then the function is continuous as it has both left and right continuity respectively.
Now, consider all data rates where is the critical rate below which all rates are stable and above which all are unstable. So, for such rates both the probabilities and are defined, so as discussed above the function is continuous on both sides for all . Now consider the case of data rates . At , we know is defined (see Eq. (8)), as rate is stable by definition, hence, the function is left continuous for this rate. However, for the other side since we know is not stable i.e., , hence, will not be defined and function is not right continuous. So, for function is left continuous but not right continuous. However, for all , is a continuous function (both limits exist) for all .
∎
Having satisfied Szpankowski’s first condition of monotonicity, we shall use two other general results to characterize the stability region of the multi-dimensional Markov chain associated with the Data Collection process. In particular, we will use Szpankowski’s “isolation lemma” (Lemma 2) and Loynes’ scheme [18] as adapted to our situation (Lemma 3).
Lemma 2** (Szpankowski [27]).**
Given , an -dimensional Markov chain.
If it is defined on a countable state space, then the stability of for all implies the stability of the multi-dimensional Markov chain . 2. 2.
If for some , say , is unstable, then is also unstable.
Lemma 3** (Loynes [18]).**
Given a pair of a strictly stationary and ergodic process, let . Then, the following holds:
If , then is stable. 2. 2.
If , then is unstable and (a.s.).
Using these tools and Szpankowski’s general method we will now prove the existence of a non-trivial stability region for the multi-dimensional Markov chain corresponding to a Data Collection process defined on an undirected graph .
Proof of Theorem 1.
We first proceed by proving the sufficient part i.e., existence of a non-trivial such that the multi-dimensional Markov chain is ergodic for all and then the necessary part of the argument i.e., for all the chain is non-ergodic.
Sufficiency.
Given a partition of queues we define a modification of -dimensional chain represented as where all nodes in have the same behavior as in but the nodes in are not allowed to have empty queues. Let us now first set (non-persistent users) and (persistent users). For any , we know the one step basic queue evolution equation under the Data Collection process for any is as follows.
[TABLE]
So, at each node we have an arrival from with probability in since the queue of is always non-empty and the departure is the usual .
Now, since we know for all , so the sum of the outgoing probabilities from is greater than the sum of the incoming probabilities, i.e., Therefore, there must be a vertex for which . So, from Eq. (7) for this we note that the expected drift is
[TABLE]
which is negative for an appropriately small but non-zero value of , let’s call it .
Now, to apply Loynes’ scheme to vertex we need to ensure that the sequence is strictly stationary where is the number of incoming packets to at time and is the number of outgoing packets from . Since all nodes , so as well as its neighbors always have a packet in the queue, so, both and are sequences of independent Bernoulli random variables and hence are stationary and ergodic. So, we can apply Loynes’ scheme (Lemma 3) to claim that the one-dimensional process is stable, and, hence, is stable.
Now, we assume there is a non-empty set of non-persistent users and a such that is stable and has a stationary distribution. To apply Loynes’ scheme to a vertex, we need to ensure that the sequence is strictly stationary. Since there is always a packet in the queue at and so is a sequence of independent Bernoulli random variables which take value 1 with probability and 0 otherwise. We decompose as the sum 0-1 random variables , where if receives a packet from at time . Then
[TABLE]
Since all have a packet in their queue at all , each is the sum of Bernoulli random variables and hence taken from a strongly stationary sequence. If we start the from an initial state picked according to this stationary distribution which ensures that the process stays in the stationary state for all . In particular, this implies that for any , number of incoming packets from at time is a sequence of random variables that is strongly stationary. Therefore is a strongly stationary sequence and we can apply Loynes’ scheme. The expected drift at time at any for any is given by
[TABLE]
Since the graph is connected and so there is at least one pair such that and , therefore we know that This means that there is a such that For this the first two terms in Eq. (10) add up to a value which is negative. Further from Lemma 1 we note that the third term is continuous and increasing in and tends to 0 as . Hence, it is possible to find a value which lies in such that the expected drift is negative. So, from Loynes’ scheme (Lemma 3) this implies that is stable for . Moreover, from Lemma 2 since the stability of all the one-dimensional Markov Chains associated with the vertices in implies the stability of the overall multi-dimensional chain. Consequently, the same holds for . Therefore by induction there is a such that for , is stable.
Necessity.
Corresponding to the sequence by which the stability region is expanded to include all the vertices of there is a sequence such that Let be the vertex for which . Assume for the sake of simplicity of presentation that Hence we can choose any such that For this we know that is stable. If we start this chain from its stationary distribution then the number of packets that are transmitted from to form a strongly stationary sequence. Since is persistent in this setting the packets leaving it are also strongly stationary. Hence Loynes’ scheme (Lemma 3) can be applied. By the choice of we know that the expected drift at is strictly positive and so is unstable and hence by Lemma 2, is unstable.
In order to show that is also unstable for this choice of we will show there is a coupling of and with an appropriately chosen initial condition such that the two models behave exactly similarly. We know on the set of sample paths (of positive probability) on which the queue at remains strictly positive the two coupled models behave exactly similarly because the difference only arises if the queue at becomes 0 at time , in which case is automatically set to 1 since is persistent and remains 0. Now, we know that is unstable, so when we start according to its stationary distribution and we set the queue at to 1, there is positive probability that this queue never reaches 0. So, for those cases behaves similarly as i.e., it is unstable. Therefore with these initial conditions is not substable since with positive probability , for all finite . Hence, is unstable and, by Lemma 2, is unstable for our choice of and, by the monotonicity of the process (see Lemma 1), it is unstable for all choices of . ∎
Having established the existence of a non-trivial critical data rate for the Markov chain of Data Collection process below which the chain is ergodic and above which it is non-ergodic, we will now characterize this critical rate.
4 Characterizing the critical rate
In section 3, we proved the ergodicity of Markov chain associated with the Data Collection process and showed that its stationary distribution exists. Now, in this section we will show that at steady-state Data Collection process is same as a special class of Linear equations which we call as the “one-sink” Laplacian system. Using this equivalence we will derive a lower bound on the critical rate. We will also discuss some common topologies in context of this result and show some tight examples. Lastly, we will also present an upper bound on the critical rate.
4.1 Equivalence to one-sink Laplacian systems
The basic one step queue evolution equation under the Data Collection process for any node is as follows.
[TABLE]
where the second and third term on the right-hand side of the above equation represents the transmissions sent to and received from the neighbors respectively and is the number of packets generated at , which is 1 with probability if , for the sink , and for all other nodes , where . Now, taking expectations on both sides of Eq. (11) and let be the queue occupancy probability of node and observing that , where is the relative rate vector, we have
[TABLE]
From Theorem 1, we know that for an appropriately chosen value of the Data Collection process has a steady state. Moreover, at steady state is a constant, so if we let be the queue occupancy probability of node at the stationarity, then we have the steady-state equation for the given node as
[TABLE]
We can also represent the steady-state equations of all nodes in matrix form as follows. For this, let us first order the nodes such that the th node represents the sink. Let be an element column vector representing the steady-state queue occupancy probability of nodes . We drop the subscript where the rate is understood from the context. So, we have . This is defined assuming that sink collects all data it receives and has no notion of maintaining queue. Let be another element column vector such that if , and 0 elsewhere, and be the usual identity matrix. So, given the transition matrix for the random walk defined by on graph , the steady-state queue equations at the nodes can be written in matrix form as
[TABLE]
As we know transition matrix where is the diagonal matrix of generalized degrees and is the adjacency matrix, so matrix is also a Laplacian as we can rewrite it as . So, the above equation (Eq. (14)) can be rewritten as
[TABLE]
where is a row vector such that for all where is the steady-state queue occupancy probability and is the generalized degree of node . Eq. (15) is similar to Laplacian systems of the form with a constraint that only one element in is negative. We call such systems “one-sink” Laplacian systems. In our subsequent work [8] we discuss this connection in detail.
4.2 A lower bound
Now having established the steady-state equation for the Data Collection process, we will use it for characterising the critical data rate. In particular, we will prove a lower bound on such rate.
Proof of Theorem 2.
For a given graph , with source set and transition matrix for random walk defined by on graph , recall that the steady-state queue equations at nodes can be written in vector form as
[TABLE]
Now, in order to bound the maximum stable data rate at which the source nodes generate data in terms of the underlying graph parameters, we will consider eigendecomposition of the left hand side of Eq. (16). For this, we will deviate from the usual inner product on the vector space i.e., and define another inner product on which is given by where is the stationary distribution of random walk defined by on graph satisfying . From Lemma 12.2 [16], it is known that the inner product space has an orthonormal basis of real-valued eigenfunctions corresponding to real eigenvalues . Using this lemma and writing the vector in terms of the eigenvectors, we have This gives us that , where is the eigenvalue of transition matrix . Moreover, from Lemma 12.1 of [16], we also know that the absolute value of any eignevalue of a transition matrix can be at most , so, . So, we have
[TABLE]
Note, that form an orthonormal basis so, . Hence we have
[TABLE]
The eigenfunction corresponding to the eigenvalue 1 can be taken to be a constant vector 1, so , where . Also, . So, using these results in Eq. (19) we have
[TABLE]
where, is the expected queue occupancy probability of nodes under stationary distribution . Now, taking square of norm of Eq. (18) and using Eq. (20), we have
[TABLE]
Using Eq. (21) in the square of norm of Eq. (16), we have
[TABLE]
Moreover, as , so we have
[TABLE]
where
Now to get a bound on , we consider two nodes whose queue occupancy probability we know precisely (1) the sink, , which has (as it has no notion of maintaining queue and it sinks data packets as soon as it receives them), and (2) a node with maximum queue occupancy probability for a given , let it be . Now, let where is the critical data rate and . From Eq. (14) we know is linear in and , so and hence, we have .
We note that the contribution of and with as the expected queue occupancy probability of nodes under the stationary distribution is as follows.
[TABLE]
where the last inequality holds as achieves optimum at . So, first using Eq. (23) and Eq. (24) in Eq. (22) and then we know as , , so we have
[TABLE]
Now, we know , and where, and are the generalized minimum and maximum degrees of graph respectively. So using the appropriate bounds on in Eq. (25) we have
[TABLE]
where is the second smallest eigenvalue of the transition matrix of random walk defined by weight function and is the generalized degree of the sink node. ∎
In Table 1, we present lower bound on the critical data rate for the stochastic Data Collection process. We also present the exact values of data rate which are easy to calculate using elementary algebra for these topologies. In all these cases, we assume that all edges have unit weight i.e., random walk defined by is simple random walk, there is only one source node i.e., such that .
If we consider the complete graph topology it is easy to see that the exact rate is . As, the spectral gap of the simple random walk on the complete graph of nodes is , we note that for this case our lower bound is tight i.e., both the exact value and the lower bound have order . Similarly, for the star graph with sink at outer edge, our lower bound is tight and is of order . Hence it is clear that our lower bound cannot admit any asymptotic improvement in general. On the other hand, consider cycle topology which shows that for specific cases a better lower bound may be possible. We note that our spectral gap-based lower bound is a lower than the exact value for this case. Similarly, for other topologies like wheel graph, complete binary tree and -times star of star graph (-regular tree defined on levels) a better lower bound is possible.
4.3 An upper bound
We also prove an upper bound on the critical data rate for a special case where . In order to present this bound, we need to define some terms. For any vertex , we define its measure as, . Similarly, for any we define the measure . We also define the edge boundary as , so, . We have the following upper bound result.
Proposition 1**.**
Given a graph with nodes out of which there is one sink and set of source nodes running a Data Collection process having critical data rate as defined by Theorem 1. To achieve stable queues must satisfy
[TABLE]
where is the transition matrix of random walk defined by , is a constant and is at most , the edge expansion of graph .
Proof of Proposition 1.
Given any vertex , recall its measure is defined as, , and for any we have . Similarly, for edge boundary as , we have . Now, let us define constants and where is the edge expansion of graph .
We know, for any given set , where the maximum data flow that can move out of this set is the flow across the boundary , so
[TABLE]
Now, for set , we have . So, from eq. (28) . Hence, the upper bound on the critical data rate is given by,
[TABLE]
∎
Note that our derived upper and lower bound on the critical data rate relates directly to the two sides of Cheeger’s inequality [4].
5 Geometric rate of convergence
Next, we characterize the rate of convergence of Markov chain for the stable regime i.e., . In particular, we first prove a general result about the total variation distance between the probability distributions of two Markov chains and their rate of convergence. Then, as a special case of this result we show that the convergence of Markov chain is geometric i.e, starting from any initial state, the distance from the stationarity reduces exponentially. Note that we drop the superscript from the Markov chain representation as a stable data rate value for proving the convergence rate is assumed.
Proof of Theorem 3.
We first note that our Markov chain is stochastically ordered (c.f. [19]). In general this means, if we are given two random processes and supported on we say is stochastically dominated by if for every increasing function . For our Data Collection chain we state the stochastic orderedness property as follows.
Claim 1**.**
Given two instances of the Data Collection process and such that , is stochastically dominated by . In particular this means that for all .
The proof of this claim follows by constructing a coupling between the two chains such that each of them perform exactly the same transmission actions. In case one of the chains is empty then the transmission action is a dummy action. It is easy to see that stochastic ordering follows naturally for the Data Collection chain.
To use this claim, for our irreducible and aperiodic Markov chain described by the Data Collection process defined on having transition matrix and a stationary distribution , let us define two other irreducible and aperiodic Markov chains and , each with state space . Initially, suppose the data is generated in the two chains in a coupled way such that one of them dominates the other i.e., either for all or vice-versa.
Now, consider the coupling on defined over random sequences where is the set of one-step destinations from node , such that both the chains and are populated in a coupled way. Such Markov chains are said to be stochastically ordered chains in the queueing theory and have a property that the Markov chain which dominates the other chain will always maintain dominance over it.
Now, under this coupling we allow the two chains to run in a way that any data generation or data transmission decision made by any queue in one chain is followed by the corresponding queue in the other chain as well. However, to distinguish the newly generated packets in two chains from the existing ones, we assign colors to the data packets: the existing packets in chain are colored red and in chain are colored blue, and the newly generated packets in both the chains are colored green. Moreover, in both the chains green (newly generated) packets get a preference in the transmission. Now, let represent the number of green packets in the queue of a given node in and be the steady-state queue occupancy probability of Markov chain . Since, the number of green packets in both the chains starts from zero and the chains are stochastically ordered, green packet queue occupancy is always bounded by that of the chain with stationary distribution i.e., . Same holds true for the other chain as well.
Now, to ensure both chains get coupled all the red and blue (old) packets in and respectively need to be sunk. We consider chain and the same will hold for as well. We know by our preference in transmission, the probability that red packets move out of queue in one time step in is equal to the probability that there are no green packets in the given queue i.e., . Also, we have , where . Now, let and be the total number of red and blue data packets in chains and respectively at the beginning which are assumed to be finite. Also, let and be the time taken by the the respective number of packets to get sunk. We have the following lemma that bounds this time.
Lemma 4**.**
Given a Data Collection process on graph with as the total number of data packets present in the queues of all nodes initially, then the time taken by all packets to reach the sink, let it be is bounded as
[TABLE]
where is the worst-case hitting time of random walk on and is the maximum queue occupancy probability at stationarity.
Proof of Lemma 4.
To prove this lemma we follow the delay analysis by Leighton et al. [14]. So, for our given Data Collection process on graph with as the total number of data packets present in the queues of all nodes initially, each data packet has its own trajectory or trace of random walk which indicates its path to reach the sink. Moreover, to each of these packets we assign distinct ranks out of range which will be determined later and the packet with the lowest rank always gets preference in the transmission. Among all possible sets of ranks assigned to the packets we choose a particular trace of random walk and find a delay sequence for it.
A delay sequence of length as defined by Leighton et al. involves backtracking the path of data packets where is determined by the analysis. In particular, given a data packet which arrived at the sink we follow it backwards till the edge it got delayed last time, suppose that edge is . Let be the length of the path from the sink to edge and suppose got delayed by packet . Then, we follow backwards till the edge where it got delayed by some packet. This is repeated till we get packet delayed packet over edge . So, the path from to the sink forms a delay sequence. Moreover, the intermediate paths of length have the property that where is the maximum number of edges that can be traversed by a trace of random walk.
Now, from our earlier argument we know that probability that any of the (old) packet moves out of queue in one time step is at least , where represents the maximum queue occupancy probability at stationarity. This means any one step in this stochastic process takes on an average time. So, the expected time taken by any random walk to hit the sink is where is the worst-case hitting time of random walk. So, by Markov’s inequality . Now, consider the probability of a random walk not hitting sink in times i.e., we consider time and divide it into slots of each. By the Markov property of random walks, we know that the random walks in each of these slots are independent. So, we have the following result.
[TABLE]
So, now there are two delays associated with any data packet: one is the self-delay of and the second one is due to the presence of other data packets in the queue. So, the number of different delay sequences of length is at most . This is because there are at most possibilities of choosing the intermediate path lengths such that as, despite of self-delay the number of steps is still upper bounded by , and then there are possibilities to choose packet . Similarly, for all other delay packets there are possibilities. The last factor comes from choosing a set of ranks from range . Moreover, probability of choosing a delay sequence such that the ranks are distinct is . So,
[TABLE]
If we set and , then
[TABLE]
Finally, combining Eq. (32) and Eq. (34) we get the desired result. ∎
Now, since both the chains and operate in parallel, the expected time for the two chains to couple i.e., all red and blue packets get sunk is the maximum of the time taken by each to get their respective packets sunk. So, using Lemma 4 for both the chains we have the expected time for , to couple, let it be as
[TABLE]
Note that this expected coupling time result is similar to the delay result of Leighton et al. [15][14] depicting the pipelining behaviour of Data Collection process.
Now, to bound the distance between the two chains and we use the following result from Levin et al. [16].
Lemma 5** (Theorem 5.2, Levin et al. [16]).**
Let be a coupling with initial states such that and and coupling time defined as , then,
[TABLE]
Let be the initial states of and chain then using Lemma 5 and the expected coupling time from Eq. (35) for , we have
[TABLE]
Now, assume the stable data rate at which we are running these stochastic processes is where is the critical data rate and . Also, from linearity of (see Eq. (14)) we know as and hence, we have . Using this in Eq. (36) we get the desired result. ∎
To use Theorem 3 to prove the geometric ergodicity result (Corollary 1) we pick according to the stationary distribution of the Data Collection process Markov chain.
Proof of Corollary 1.
Let us consider two instances of Data Collection process and such that the former starts from some finite state and the latter starts from stationarity i.e., initially all queues in are occupied by some finite number of packets and that of are filled according to the stationary distribution . Then, from Theorem 3 we have
[TABLE]
where is the worst-case hitting time of random walk on graph, and are the total number of data packets in state and at stationarity respectively and is the relative distance from the critical data rate. Now, if we compare Eq. (37) with the Definition 1 (Eq. (2)) we prove geometric ergodicity property for the Markov chain .
Now for random variable , let be its expectation i.e., the expected number of data packets in at stationarity which by Little’s law [17] is equal to the product of the data generation rate and the expected latency of a data packet to reach the sink at the stationarity i.e., (from linearity of and ) where is the critical data rate and . Now, let . So by the definition of we have two regimes: where the term is dominant and where the is dominant.
For the simple case of , using Eq. (37) we have
[TABLE]
Similarly for we have
[TABLE]
Setting the RHS of Eq. (39) to and solving for we get that
[TABLE]
Combining (38) and (40) gives us the result.
We observe that if we set to (all zeros), i.e., all queues are initially empty, then is 1 so only Eq. (40) applies and we determine the mixing time by setting the RHS to for a given value of . ∎
6 The connection to algorithms and some future directions
The fact that the Data Collection Process mixes fast to its stationary distribution when started from the all-empty setting can be exploited to solve systems of equations such as Eq. (15) simply by allowing the process to get close enough to stationarity and then estimate the by keeping track of the number of time slots for which each queue is occupied. This opens up the possibilities of distributed algorithms for effective resistance and other problems, some of which we have explored in [8]. Even if we consider graph problems on very large graphs, Laplacian systems of equations become tractable via this method since random walks can be simulated very fast in modern computing systems for graphs with nodes in the millions (see, e.g., [25]).
The key shortcoming of our work is that the Data Collection Process in the subcritical region models only one-sink Laplacian systems of equations. A model that captures the full generality of Laplacian systems of equations will open a more general class of problems that can be attacked algorithmically using this method.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. K. An and H. Cho. Efficient data collection in interference-aware wireless sensor networks. Journal of Networks , 2015.
- 2[2] L. Becchetti, V. Bonifaci, and E. Natale. Pooling or sampling: Collective dynamics for electrical flow estimation. In Proc. of the 17th Intl. Conf. on Autonomous Agents and Multi Agent Systems , AAMAS ’18, pages 1576–1584, Richland, SC, 2018. International Foundation for Autonomous Agents and Multiagent Systems.
- 3[3] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Gossip algorithms: Design, analysis and applications. In Proc. of the 24th Annual Joint Conf. of the IEEE Computer and Comm. Societies , INFOCOM ’05, pages 1653–1664 vol. 3. IEEE, 2005.
- 4[4] J. Cheeger. A lower bound for the smallest eigenvalue of the laplacian. In Proc. of the Princeton conference in honor of Professor S. Bochner , pages 195–199, 1969.
- 5[5] P. Christiano, J. A. Kelner, A. Mądry, D. A. Spielman, and S-H. Teng. Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs. In Proc. of the 43rd annual ACM Symp. on Theory of computing , STOC ’11, pages 273–282. ACM, 2011.
- 6[6] A. M. Frieze, N. Goyal, L. Rademacher, and S. Vempala. Expanders via random spanning trees. volume 43, page 497–513. SIAM, 2014.
- 7[7] L. Georgiadis and W. Szpankowski. Stability of token passing rings. Queueing systems , 11(1-2):7–33, 1992.
- 8[8] I. A. Gillani and A. Bagchi. A distributed laplacian solver and its applications to electrical flow and random spanning tree computation. ar Xiv:1905.04989 [cs.DC], 2019.
