Lyapunov Criterion for Stochastic Systems and Its Applications in Distributed Computation
Yuzhen Qin, Ming Cao, Brian D. O. Anderson

TL;DR
This paper introduces a new Lyapunov criterion for stochastic systems that guarantees convergence and stability, with applications in analyzing random matrix products and distributed algorithms for solving linear equations.
Contribution
It proposes a novel Lyapunov condition allowing finite-step expected decrease without strict per-step decrease, extending classical stochastic stability theory.
Findings
Conditions for almost sure convergence of random matrix products.
Exponential convergence rate under additional assumptions.
Relaxed network structure requirements for distributed linear algebra algorithms.
Abstract
This paper presents new sufficient conditions for convergence and asymptotic or exponential stability of a stochastic discrete-time system, under which the constructed Lyapunov function always decreases in expectation along the system's solutions after a finite number of steps, but without necessarily strict decrease at every step, in contrast to the classical stochastic Lyapunov theory. As the first application of this new Lyapunov criterion, we look at the product of any random sequence of stochastic matrices, including those with zero diagonal entries, and obtain sufficient conditions to ensure the product almost surely converges to a matrix with identical rows; we also show that the rate of convergence can be exponential under additional conditions. As the second application, we study a distributed network algorithm for solving linear algebraic equations. We relax existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Lyapunov Criterion for Stochastic Systems and Its Applications in Distributed Computation
Yuzhen Qin, Ming Cao,
and Brian D. O. Anderson Y. Qin and M. Cao are with the Institute of Engineering and Technology, Faculty of Science and Engineering, University of Groningen, Groningen, the Netherlands ({y.z.qin, m.cao}@rug.nl). B.D.O. Anderson is with School of Automation, Hangzhou Dianzi University, Hangzhou, 310018, China, and Data61-CSIRO and Research School of Engineering, Australian National University, Canberra, ACT 2601, Australia ([email protected]). The work of Cao was supported in part by the European Research Council (ERC-CoG-771687) and the Netherlands Organization for Scientific Research (NWO-vidi-14134). The work of B.D.O. Anderson was supported by the Australian Research Council (ARC) under grants DP-130103610 and DP-160104500, and by Data61-CSIRO.
Abstract
This paper presents new sufficient conditions for convergence and asymptotic or exponential stability of a stochastic discrete-time system, under which the constructed Lyapunov function always decreases in expectation along the system’s solutions after a finite number of steps, but without necessarily strict decrease at every step, in contrast to the classical stochastic Lyapunov theory. As the first application of this new Lyapunov criterion, we look at the product of any random sequence of stochastic matrices, including those with zero diagonal entries, and obtain sufficient conditions to ensure the product almost surely converges to a matrix with identical rows; we also show that the rate of convergence can be exponential under additional conditions. As the second application, we study a distributed network algorithm for solving linear algebraic equations. We relax existing conditions on the network structures, while still guaranteeing the equations are solved asymptotically.
I Introduction
Stability analysis for stochastic dynamical systems has always been an active research field. Early works have shown that stochastic Lyapunov functions play an important role, and to use them for discrete-time systems, a standard procedure is to show that they decrease in expectation at every time step [1, 2, 3, 4]. Properties of supermartingales and LaSalle’s arguments are critical to establish the related proofs. However, most of the stochastic stability results are built upon a crucial assumption, which requires that the stochastic dynamical system under study is Markovian (see e.g., [1, 2, 3, 5]), and very few of them have reported bounds for the convergence speed.
More recently, with the fast development of network algorithms, more and more distributed computational processes are carried out in networks of coupled computational units. Such dynamical processes are usually modeled by stochastic discrete-time dynamical systems since they are usually under inevitable influences from random changes of network structures [6, 7, 8, 9], communication delay and noise [10, 11, 12], and asynchronous updating events [13, 14]. So there is great need in further developing Lyapunov theory for stochastic dynamical systems, in particular in the setting of network algorithms for distributed computation. And this is exactly the aim of this paper.
We aim at further developing the Lyapunov criterion for stochastic discrete-time systems. Motivated by the concept of finite-step Lyapunov functions for deterministic systems [15, 16, 17], we propose to define a finite-step stochastic Lyapunov function, which decreases in expectation, not necessarily at every step, but after a finite number of steps. The associated new Lyapunov criterion not only enlarges the range of choices of candidate Lyapunov functions but also implies that the systems that it can be used to analyze do not need to be Markovian. An additional advantage of using this new criterion is that we are enabled to construct conditions to guarantee exponential convergence and estimate convergence rates.
We then apply the finite-step stochastic Lyapunov function to study two distributed computation problems arising in some popular network algorithmic settings. In distributed optimization [18, 19] and other distributed coordination algorithms [20, 21, 22, 7], one frequently encounters the need to prove convergence of inhomogeneous Markov chains, or equivalently the convergence of backward products of random sequences of stochastic matrices . Most of the existing results assume exclusively that all the in the sequence have all positive diagonal entries, see e.g., [23, 24, 25]. This assumption simplifies the analysis of convergence significantly; moreover, without this assumption, the existing results do not always hold. For example, from [22, 7] one knows that the product of converges to a rank-one matrix almost surely if exactly one of the eigenvalues of the expectation of has the modulus of one, which can be violated if has zero diagonal elements. Note also that most of the existing results are confined to special random sequences, e.g., independently distributed sequences [22], stationary ergodic sequences [7], or independent sequences [26, 27]. Using the new Lyapunov criterion in this paper, we work on more general classes of random sequences of stochastic matrices without the assumption of non-zero diagonal entries. We show that if there exists a fixed length such that the product of any successive subsequence of matrices of this length has the scrambling property (a standard concept, but it will be defined subsequently) with positive probability, the convergence to a rank-one matrix for the infinite product can be guaranteed almost surely. We also prove that the convergence can be exponentially fast if this probability is lower bounded by some positive number, and the greater the lower bound is, the faster the convergence becomes. For some particular random sequences, we further relax this “scrambling” condition. If the random sequence is driven by a stationary process, the almost sure convergence can be ensured as long as the product of any successive subsequence of finite length has positive probability to be indecomposable and aperiodic (SIA). The exponential convergence rate follows without other assumptions if the random process that governs the evolution of the sequence is a stationary ergodic process.
As the second application of the finite-step stochastic Lyapunov functions, we investigate a distributed algorithm for solving linear algebraic equations of the form . The equations are solved in parallel by agents, each of whom just knows a subset of the rows of the matrix . Each agent recursively updates its estimate of the solution using the current estimates from its neighbors. Recently several solutions under different sufficient conditions have been proposed [28, 29, 30], and in particular in [30], the sequence of the neighbor relationship graphs is required to be repeated jointly strongly connected. We show that a much weaker condition is sufficient to solve the problem almost surely, namely the algorithm in [30] works if there exists a fixed length such that any subsequence of at this length is jointly strongly connected with positive probability.
The remainder of this paper is organized as follows. In Section II, we define the finite-step stochastic Lyapunov functions. Products of random sequences of stochastic matrices are studied in Section III; in Section IV we look into in particular the asynchronous implementation issues as an application of Section III. Finally, we study in Section V a distributed approach for solving linear equations. Brief concluding remarks appear in Section VI.
Notation: Throughout this paper, denotes the sets of non-negative integers, the collection of positive integers, and the real -dimensional vector space. Moreover, we let be the vector consisting of all ones, and let . Given a vector , denotes the th element of . Let , , be any -norm. A continuous function is said to belong to class if it is strictly increasing and . For any two events , the conditional probability denotes the probability of given .
II Finite-Step Stochastic Lyapunov Functions
Consider a stochastic discrete-time system described by
[TABLE]
where , and is a -valued stochastic process on a probability space . Here is the sample space; is a set of events which is a -field; is a function that assigns probabilities to events; is a measurable function mapping into the state space , and for any , is a realization of the stochastic process at . Let for , , so that evidently is an increasing sequence of -fields. Following [31], we consider a constant initial condition with probability one. It then can be observed that the solution to (1), , is a -valued stochastic process adapted to . The randomness of can be due to various reasons, e.g., stochastic disturbances or noise. Note that (1) becomes a stochastic switching system if , where maps into the set , and is a given family of functions.
A point is said to be an equilibrium of system (1) if for any . Without loss of generality, we assume that the origin is an equilibrium. Researchers have been interested in studying the limiting behavior of the solution , i.e., when and to where converges as . Most noticeably, Kushner developed classic results on stochastic stability by employing stochastic Lyapunov functions [1, 2, 3]. We introduce some related definitions before recalling some Kushner’s results. Following [32, Sec. 1.5.6] and [33], we first define convergence and exponential convergence of a sequence of random variables.
Definition 1** (Convergence).**
A random sequence in a sample space converges to a random variable almost surely if The convergence is said to be exponentially fast with a rate no slower than for some independent of if almost surely converges to for some finite . Furthermore, let be a set; a random sequence is said to converge to almost surely if where .
Here “almost surely” is exchangeable with “with probability one”, and we sometimes use the shorthand notation “a.s.”. We now introduce some stability concepts for stochastic discrete-time systems analogous to those in [5] and [34] for continuous-time systems111Note that 1) and 2) of Definition 2 follow from the definitions in [5, Chap. 5], in which an arbitrary initial time rather than just [math] is actually considered. We define 3) following the same lines as 1) and 2). In Definition 3, 1) follows from the definitions in [34], and we define 2) following the same lines as 1). .
Definition 2**.**
The origin of (1) is said to be:
1) stable in probability* if for any ;*
2) asymptotically stable in probability* if it is stable in probability and moreover ;*
3) exponentially stable in probability* if for some independent of , ;*
Definition 3**.**
For a set containing the origin, the origin of (1) is said to be:
1) locally a.s. asymptotically stable in (globally a.s. asymptotically stable, respectively)* if starting from (, respectively) all the sample paths stay in (, respectively) for all and converge to the origin almost surely;*
2) locally a.s. exponentially stable in (globally a.s. exponentially stable, respectively)* if it is locally (globally, respectively) a.s. asymptotically stable and the convergence is exponentially fast.*
Now let us recall some Kushner’s results on convergence and stability, where stochastic Lyapunov functions have been used.
Lemma 1** (Asymptotic Convergence and Stability).**
For the stochastic discrete-time system (1), let be a Markov process. Let be a continuous positive definite and radially unbounded function. Define the set for some , and assume that
[TABLE]
where is continuous and satisfies for any . Then the following statements apply:
i)* for any initial condition , converges to with probability at least [3];*
ii)* if moreover is positive definite on , and for two class functions and , then is asymptotically stable in probability [3], [35, Theorem 7.3].*
Lemma 2** (Exponential Convergence and Stability).**
For the stochastic discrete-time system (1), let be a Markov process. Let be a continuous nonnegative function. Assume that
[TABLE]
Then the following statements apply:
i)* for any given , almost surely converges to [math] exponentially fast with a rate no slower than [2, Th. 2, Chap. 8], [35];*
ii)* if moreover satisfies for some , then is globally a.s. exponentially stable [35, Theorem 7.4].*
To use these two lemmas to prove asymptotic (or exponential) stability for a stochastic system, the critical step is to find a stochastic Lyapunov function such that (2) (respectively, (3)) holds. However, it is not always obvious how to construct such a stochastic Lyapunov function. We use the following toy example to illustrate this point.
Example 1. Consider a randomly switching system described by , where is the switching signal taking values in a finite set ,and
[TABLE]
The stochastic process is described by a Markov chain with initial distribution . The transition probabilities are described by a transition matrix
[TABLE]
whose th element is defined by . Since is not independent and identically distributed, the process is not Markovian. Nevertheless, we might conjecture that the origin is globally a.s. exponentially stable. In order to try to prove this, we might choose a stochastic Lyapunov function candidate , but the existing results introduced in Lemma 2 cannot be used since is not Markovian. Moreover, by calculation we observe that for any , which implies that (3) is not necessarily satisfied. Thus is not an appropriate stochastic Lyapunov function for which Lemma 2 can be applied. As it turns out however, the same can be used as a Lyapunov function to establish exponentially stability via the alternative criterion set out subsequently.
It is difficult, if not impossible, to construct a stochastic Lyapunov function, especially when the state of the system is not Markovian. So it is of great interest to generalize the results in Lemmas 1 and 2 such that the range of choices of candidate Lyapunov functions can be enlarged. For deterministic systems, Aeyels et al. have introduced a new Lyapunov criterion to study asymptotic stability of continuous-time systems [15]; a similar criterion has also been obtained for discrete-time systems, and the Lyapunov functions satisfying this criterion are called finite-step Lyapunov functions [16, 17]. A common feature of these works is that the Lyapunov function is required to decrease along the system’s solutions after a finite number of steps, but not necessarily at every step. We now use this idea to construct stochastic finite-step Lyapunov functions, a task which is much more challenging compared to the deterministic case due to the uncertainty present in stochastic systems. The tools for analysis are totally different from what are used for deterministic systems. We will exploit supermartingales and their convergence property, as well as the Borel-Cantelli Lemma; these concepts are introduced in the two following lemmas.
Lemma 3** ([36, Sec. 5.2.9]).**
Let the sequence be a nonnegative supermartingale with respect to , i.e., suppose: (i) ; (ii) for all ; (iii) . Then there exists some random such that , and .
Lemma 4** (Borel-Cantelli Lemma, [2, P.192]).**
Let be a nonnegative random sequence. If , then .
We are now ready to present our first main result on stochastic convergence and stability.
Theorem 1**.**
For the stochastic discrete-time system (1), let be a continuous nonnegative and radially unbounded function. Define the set for some , and assume that
a)* for any such that ;*
b)* there is an integer , independent of , such that for any , , where is continuous and satisfies for any .
Then the following statements apply:*
i)* for any initial condition , converges to with probability at least ;*
ii)* if moreover is positive definite on , and for two class functions and , then is asymptotically stable in probability.*
Proof.
Before proving i) and ii), we first show that starting from the sample paths stay in with probability at least if Assumption a) is satisfied. This has been proven in [2, p. 196] by showing that
[TABLE]
Let be a subset of the sample space such that for any , for all . Let be the smallest (if it exists) such that . Note that, this integer does not exist when stays in for all , i.e., when .
We first prove i) by showing that the sample paths staying the converge to with probability one, i.e., . Towards this end, define a new function such that for , and for . Define another random process . If exists, when let
[TABLE]
where satisfies ; when , let for any . If does not exist, we let for all . Then it is immediately clear that . By taking the expectation on both sides of this inequality, we obtain
[TABLE]
For any , there is a pair such that . It follows from (5) that
[TABLE]
By summing up all the left and right sides of these inequalities respectively for all the and , we have
[TABLE]
As is nonnegative for all , from (5) it is easy to observe that the left side of (6) is greater than even when since and are finite numbers, which implies that \sum_{i=0}^{\infty}\mathbb{E}\tilde{\varphi}\big{(}\tilde{z}_{k}\big{)}<\infty. By Lemma 4, ones knows that \tilde{\varphi}\big{(}\tilde{z}_{k}\big{)}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0 as . For , one can observe that and according to the definitions of and , respectively. Therefore, for all , and subsequently
[TABLE]
From the continuity of it can be seen that . The proof of i) is complete since (4) means that the sample paths stay in with probability at least .
Next, we prove ii) in two steps. We first prove that the origin is stable in probability. The inequalities imply that if and only if . Moreover, it follows from and the inequality (4) that for any initial condition ,
[TABLE]
for any . Since is a class function and thus invertible, it can be observed that . Then for any , there holds that , which means that the origin is stable in probability.
Second, we show the probability that tends to as . One knows that since is positive definite in . From i) one knows that converges to with probability at least . Since as , there holds that . The proof is complete. ∎
Particularly, if is positively invariant, i.e., starting from all sample paths will stay in for all , this corollary follows from Theorem 1 straightforwardly.
Corollary 1**.**
If is positively invariant w.r.t the system (1) and the assumptions a) and b) in Theorem 1 are satisfied, then the following statements apply:
i)* for any initial condition , converges to with probability one;*
ii)* if moreover is positive definite on , and for two class functions and , then is locally a.s. asymptotically stable in . Furthermore, if , then is globally a.s. asymptotically stable.*
The next theorem provides a new criterion for exponential convergence and stability of stochastic systems, relaxing the conditions required by Lemma 2.
Theorem 2**.**
Suppose the assumptions a) and b) of Theorem 1 are satisfied with the inequality of b) strengthened to
[TABLE]
Then the following statements apply:
i)* for any given , converges to [math] exponentially at a rate no slower than , and converges to , with probability at least ;*
ii)* if moreover satisfies that for some , then is exponentially stable in probability.*
Proof.
We first prove i). From the proof of Theorem 1, we know that the sample paths stay in with probability at least for any initial condition if the assumption a) is satisfied. We next show that for any sample path that always stays in , converges to [math] exponentially fast. Towards this end, we define a random process . Let be as defined in the proof of Theorem 1. If exists, when , let
[TABLE]
where satisfies , when , let for any ; if does not exist, we let for all .
If the inequality (7) is satisfied, one has . Using this inequality, we next show that converges to [math] exponentially. To this end, define a subsequence , for each . Let , and one knows that is determined if we know . It then follows from the inequality (7) that for any , . We observe from this inequality that
[TABLE]
This means that is a supermartingale, and thus there is a finite random number such that for any . Let , and then by definition of we have . Straightforwardly, . Let , then it almost surely holds that . From Definition 1, one concludes that almost surely converges to [math] exponentially no slower than . From the definition of , we know that for all , with defined in the proof of Theorem 1. Consequently, it holds that
[TABLE]
The proof of i) is complete since the sample paths stay in with probability at least .
Next, we prove ii). If the inequalities are satisfied, and then we know that if and only if . Moreover, it follows from (II) that for all the sample paths that stay in there holds that since . Hence, for any , and one can check that this inequality holds with probability at least . If , we know that , which completes the proof. ∎
If is positively invariant, the following corollary follows straightforwardly.
Corollary 2**.**
If is positively invariant w.r.t the system (1) and suppose the assumptions a) and b) of Theorem 1 are satisfied with the inequality of b) strengthened to (7), the following statements apply:
i)* for any given , converges to [math] exponentially no slower than with probability one;*
ii)* if moreover satisfies that for some , then is locally a.s. exponentially stable in . Furthermore, if , then is globally a.s. exponentially stable.*
The following corollary, which can be proven following the same lines as Theorems 1 and 2, shares some similarities to LaSalle’s theorem for deterministic systems. It is worth mentioning that the function here does not have to be radially unbounded.
Corollary 3**.**
Let be a compact set that is positively invariant w.r.t the system (1). Let be a continuous nonnegative function, and for some . Assume that for all such that , then
i)* if there is an integer , independent of , such that for any , , where is continuous and satisfies for any , then for any initial condition , converges to with probability at least ;*
ii)* if the inequality in a) is strengthened to for some , then for any given , converges to [math] exponentially at a rate no slower than , and converges to , with probability at least ;*
iii)* if is positively invariant w.r.t the system (1), then all the convergence in both i) and ii) takes place almost surely.*
Example 1 Cont. Now let us look back at Example 1 and still choose as a stochastic Lyapunov function candidate. It is easy to see that is a nonnegative supermartingale. To show the stochastic convergence, let and one can calculate the conditional expectations
[TABLE]
When , there analogously hold that
[TABLE]
From these three inequalities one can observe that starting from any initial condition , decreases at an exponential speed after every two steps before it reaches [math]. By Corollary 2, one knows that origin is globally a.s. exponentially stable, consistent with our conjecture.
Remark 1**.**
Kushner and other researchers have used more restricted conditions to construct Lyapunov functions than those appearing in our results to analyze asymptotic or exponential stability of random processes [2, 3, 4]. It is required that decreases strictly at every step, until reaches a limit value. However, in our result, this requirement is relaxed. In addition, Kushner’s results rely on the assumption that the underlying random process is Markovian, but we work with more general random processes.
In the following sections, we will show how the new Lyapunov criteria can be applied to distributed computation.
III Products of Random Sequences of Stochastic Matrices
In this section, we study the convergence of products of stochastic matrices, where the obtained results on finite-step Lyapunov functions are used for analysis. Let be the state space and be the set of stochastic matrices . Consider a random sequence on the probability space , where is the collection of all infinite sequences with , and we define . For notational simplicity, we denote by . For the backward product of stochastic matrices
[TABLE]
where , we are interested in establishing conditions on , under which there holds that for a random matrix where satisfies .
Before proceeding, let us introduce some concepts in probability. Let , so that evidently , is an increasing sequence of -fields. Let be the shift operator, i.e., . A random sequence of stochastic matrices is said to be stationary if the shift operator is measure-preserving. In other words, the sequences and have the same joint distribution for all and . Moreover, a sequence is said to be stationary ergodic if it is stationary, and every invariant set is trivial, i.e., for every , . Here by a invariant set , we mean .
III-A Convergence Results
We first introduce three classes of stochastic matrices, denoted by , and , respectively. We say if is indecomposable, and aperiodic (such stochastic matrices are also referred to as SIA for short); if is scrambling, i.e., no two rows of are orthogonal; and if is Markov, i.e., there exists a column of such that all entries in this column are positive [37, Ch. 4].
Coefficients of ergodicity serve as a fundamental tool in analyzing the convergence of products of stochastic matrices. In this paper, we employ a standard one. For a stochastic matrix , the coefficient of ergodicity is defined by
[TABLE]
It is known that this coefficient of ergodicity satisfies , and is proper since if and only if all the rows of are identical. Importantly, it holds that
[TABLE]
if and only if (see [37, p.82]). For any two stochastic matrices , the following property will be critical for the proof in Appendix A:
[TABLE]
To proceed, we make the following assumption for the sequence .
Assumption 1**.**
Suppose the sequence of stochastic matrices is driven by a random process satisfying the following conditions.
- a)
There exists an integer such that
[TABLE]
holds for any , and
[TABLE] 2. b)
There is a positive number such that whenever .
Now we are ready to provide our main result on the convergence of stochastic matrices’ products.
Theorem 3**.**
Under Assumption 1, the product of the random sequence of stochastic matrices converges to a random matrix almost surely as .
To prove Theorem 3, consider the stochastic discrete-time dynamical system described by
[TABLE]
for all , where , the initial state is a constant with probability one, is regarded as randomly switching signal, and is the random process of stochastic matrices we are interested in. One knows that is adapted to . Thus, to investigate the limiting behavior of the product (9), it is sufficient to study the limiting behavior of system dynamics (15). We say the state of system (15) reaches an agreement state if for some . Then the agreement of system (15) for any initial state implies that converges to a rank-one matrix as [26].
To investigate the agreement problem, we define , and
[TABLE]
For any , is adapted to since is. The agreement is said to be reached asymptotically almost surely if as , and it is said to be reached exponentially almost surely with convergence rate no slower than if there exists such that for some finite . The random variable has some important properties given by the following proposition.
Proposition 1**.**
Let , where is a stochastic matrix. Then , and for any if and only if is scrambling (i.e., .
Proof.
It is shown in [37] that with defined in (10). Therefore, the sufficiency follows from (11) straightforwardly. We then prove the necessity by contradiction. Suppose is not scrambling, and then there must exist at least two rows, denoted by , that are orthogonal. Define the two sets and , respectively. It follows then from the scrambling property that . Let for all , for all , and let be any arbitrary positive number less than 1 for all if is not empty. Then the states at time become
[TABLE]
and for all . This results in . By contradiction one knows that a scrambling is necessary for , which completes the proof. ∎
In order to prove Theorem 3, the following intermediate result is useful.
Proposition 2**.**
For any scrambling matrix , the coefficient of ergodicity defined in (10) satisfies
[TABLE]
if all the positive elements of are lower bounded by .
Proof:
Consider any two rows of , denoted by . Define two sets, and . From the scrambling hypothesis, one knows that . Thus it holds that
[TABLE]
Then from the definition of , it is easy to see
[TABLE]
which completes the proof. ∎
We are in the position to prove Theorem 3 by showing that as , where Theorem 1 and Corollary 1 will be used.
Proof:
Let be a finite-step stochastic Lyapunov function candidate for the system dynamics (15). It is easy to see if and only if . Since all are stochastic matrices, we observe that from Proposition 1, which implies that is exactly a supermartingale with respect to . From Lemma 3, we know for some because and . From Assumption 1, we know that there is an such that the product is scrambling with positive probability for any . Let be the set of all possible at time , and the cardinality of . Let be the number of scrambling matrices in . We denote each of these scrambling matrices and each of non-scrambling matrices by and , respectively. The probabilities of all the possible sum to 1, i.e.,
[TABLE]
Then the conditional expectation of after finite steps for any becomes
[TABLE]
where is given by (10). One can calculate that
[TABLE]
where Proposition 1 and equation (17) have been used. From Assumption 1.b), we know that the positive elements of are lower bounded by , and thus the positive elements of in (18) are lower bounded by . Thus according to Proposition 2, and it follows that
[TABLE]
By iterating, one can easily show that
[TABLE]
It then follows that even when , since . According to the condition (14), we know . By contradiction, it is easy to infer that . Since we have already shown that for some random , one can conclude that . For any given , define the compact set . For any random sequence , it follows from the system dynamics (15) that
[TABLE]
and thus will remain within . From Corollary 3, we know that asymptotically converges to , or equivalently, almost surely as since is continuous. In other words, for any , for some , which proves Theorem 3. ∎
For a random sequence of stochastic matrices, Theorem 3 has provided a quite relaxed condition for the backward product (9) determined by the random sequence to converge to a rank-one matrix: over any time interval of length , i.e., for any , the product has positive probability to be scrambling. The following corollary follows straightforwardly since any Markov matrix is certainly scrambling.
Corollary 4**.**
For a random sequence , the product (9) converges to a random matrix almost surely if there exists an integer such that becomes a Markov matrix for any with positive probability and .
Next we assume that the sequence is driven by an underlying stationary process. Then the condition in Theorem 3 can be further relaxed.
Assumption 2**.**
Suppose the random sequence of stochastic matrices is driven by a stationary process satisfying the following conditions.
- a)
There exists an integer such that
[TABLE]
holds for any . 2. b)
There is a positive number such that whenever .
In other words, Assumption 2 suggests that any corresponding matrix product of length becomes an SIA matrix with positive probability, and the positive elements for all are uniformly lower bounded away from some positive value.
Theorem 4**.**
Under Assumption 2, the product of the random sequence of stochastic matrices converges to a random matrix almost surely.
If two stochastic matrices and have zero elements in the same positions, we say these two matrices are of the same type, denoted by . Obviously, there holds the trivial case . One knows that for any SIA matrix , there exists an integer such that is scrambling; it is easy to extend this to the inhomogeneous case, i.e., any product of stochastic matrices of the same type of is scrambling if all the matrices are element-wise lower bounded.
Proof:
Since is driven by a stationary process, we know that has the same joint distribution as for any . For the given in Assumption 2, there exists an SIA matrix such that \Pr[W\big{(}t+kh+h,t+kh+1\big{)}=A]>0. Thus it follows that \Pr[W\big{(}t+kh+2h,t+kh+1\big{)}=A]>0 for any . Thus
[TABLE]
When , which happens with positive probability, we have
[TABLE]
By recursion one can conclude that all the products , occur as the same SIA type with positive probability. Since all the products are of the same type, one can choose such that is scrambling. This in turn implies that , and the property of stationary process makes sure that (14) holds. The conditions in Assumption 1 are therefore all satisfied, and then Theorem 4 follows from Theorem 3. ∎
Remark 2**.**
Theorems 3 and 4 have established some sufficient conditions for the convergence of a random sequence of stochastic matrices to a rank-one matrix. A further question is how these results can be applied to control distributed computation processes. To answer this question, let us consider a finite set of stochastic matrices , from which each in the random sequence is sampled. It is defined in [38] that is a consensus set if the arbitrary product , converges to a rank-one matrix. However, it has also been shown that to decide whether is a consensus set is an NP-hard problem [38, 39]. For a non-consensus set , it is always not obvious how to find a deterministic sequence that converges, especially when has a large number of elements and has zero diagonal entries. However, the convergence can be ensured almost surely by introducing some randomness in the sequence, provided that there is a convergent deterministic sequence intrinsically.
III-B Estimation of Convergence Rate
In Section III-A, we have shown how the product determined by a random process asymptotically converges to a rank-one matrix a.s. as . However, the convergence rate for such a randomized product is not yet clear. It is quite challenging to investigate how fast the process converges, especially when each may have zero diagonal entries. In this subsection, we address this problem by employing finite-step stochastic Lyapunov functions. Now let us present the main result on the convergence rate.
Theorem 5**.**
In addition to Assumption 1, if there exist a number , , such that
[TABLE]
then the almost sure convergence of the product of to a random matrix is exponential, and the rate is no slower than .
Proof:
Choosing as a finite-step stochastic Lyapunov function candidate, from (18) we have
[TABLE]
Furthermore, it is easy to see that
[TABLE]
Substituting it into (III-B) yields
[TABLE]
It follows from Corollary 3 that , with an convergence rate no slower than . In other words, the agreement is reached exponentially almost surely, which implies Theorem 5. ∎
Theorem 5 has established the almost sure exponential convergence rate for the product of . If any subsequence can result in a scrambling product with positive probability and this probability is lower bounded away by some positive number, and then the convergence rate is exponential. Interestingly, the greater this lower bound is, the faster the convergence becomes. If we consider a special random sequence which is driven by a stationary ergodic process, the exponential convergence rate follows without any other conditions apart from Assumption 2, and an alternative proof is given in Appendix A.
Corollary 5**.**
If the random process governing the evolution of the sequence is stationary ergodic, the product converges to a random rank-one matrix at an exponential rate almost surely if the conditions of Assumption 2 are satisfied.
III-C Connection to Markov Chains
In this subsection, we show that Theorems 4, and 5 are the generalizations of some well known results for Markov chains in [40, 37]. A fundamental result on inhomogeneous Markov chains is as follows.
Lemma 5** ([37, Th. 4.10], [40]).**
If the product , formed from a sequence , satisfies for any , and whenever , then converges to a rank-one matrix.
Let be the number of distinct types of scrambling matrices of order . It is known that the product is scrambling for any . In this case, we may take the probability of each product being scrambling as , and as an immediate consequence of Theorem 5, we know that converges to a rank-one matrix at a exponential rate that is no slower than . This convergence rate is consistent with what is estimated in [37, Th. 4.10]. This also applies to the homogeneous case where for any with being scrambling. Moreover, it is known that the condition can be relaxed by just requiring to be SIA to ensure the convergence, which is an immediate consequence of Theorem 4.
In next section, we discuss how the results can be further applied to the context of asynchronous computations.
IV Asynchronous Agreement over Possibly Periodic Networks
In this section, we take each component in from (15) as the state of agent in an -agent system. Define the distributed coordination algorithm
[TABLE]
where the averaging weights , , and denote the time instants when updating actions happen. Here we assume the initial state is given. It is always assumed that , where and are positive numbers. We say the states of system (22) reach agreement if , mentioned in Section III. Let , and obviously is a stochastic matrix. The algorithm (22) can be rewritten as . In fact, the matrix can be associated with a directed, weighted graph , where is the vertex set and is the edge set for which if . The graph is called a rooted one if there exists at least one vertex, called a root, from which any other vertex can be reached. It is known that agents are able to reach agreement for all if is SIA ([40, 37]). However, the situations when is not SIA have not been studied before, although they appear often in real systems, such as social networks. As we are interested in studying the agreement problem when is possibly periodic, let us define periodic stochastic matrices.
Definition 4**.**
A stochastic matrix is said to be periodic with period if is the common divisor of all the such that for a sufficiently large integer .
Definition 4 is a generalization of the definition of an irreducible periodic matrix [37, Def. 1.6]. In this definition, a periodic stochastic matrix is not necessarily irreducible. With a slight abuse of terminology, we say the graph is periodic if the associated matrix is periodic.
In the context of distributed computation, it is always assumed that each individual computational unit in the network has access to its own latest state while implementing the iterative update rules [19, 21]. A class of situations that have received considerably less attention in the literature arise when some individuals are not able to obtain their own state, a case which can result from memory loss. Similar phenomena have also been observed in social networks while studying the evolution of opinions. Self-contemptuous people change their opinions solely in response to the opinions of others. The existence of computational units or individuals who are not able to access their own states sometimes might result in the computational failure or opinions’ disagreement. As such an example, a periodic matrix , which must has all zero diagonal entries (no access to their own states for all individuals), always leads the system (22) to oscillation. This is because for a periodic , never converges to a matrix with identical rows as . Instead, the positions of that have positive values are periodically changing with , resulting in a periodically changing value of . This motivates us to investigate the particular case where is possibly periodic.
In this section, we show that agreement can be reached even when is periodic, just by introducing asynchronous updating events to the coupled agents. In fact, perfect synchrony is hard to realize in practice as it is difficult for all agents to have access to a common clock according to which they coordinate their updating actions, while asynchrony is more likely. Researchers have studied how agreement can be preserved with the existence of asynchrony, see e.g., [41, 42]. Unlike these works, we approach the same problem from a different aspect, where agreement occurs just because of asynchrony. A counterpart of this problem where is irreducible and periodic has been covered in our earlier work [43]. We consider a more general case in this section where can be reducible.
To proceed, we define a framework of randomly asynchronous updating events. It is usually legitimate to postulate that on occasions more than one, but not all, agents may update. Assume that each agent is equipped with a clock, which need not be synchronized with other clocks. The state of each agent remains unchanged except when an activation event is triggered by its own clock. Denote the set of event times of the th agent by . At the event times, agent updates its state obeying the asynchronous updating rule
[TABLE]
where . We assume that the clocks which determine the updating events for the agents are driven by an underlying random process. The following assumption is important for the analysis.
Assumption 3**.**
For any agent , the intervals between two event times, denoted by , are such that
- (i)
* are upper bounded with probability 1 for all and all ;* 2. (ii)
* is a random sequence, with , , , being mutually independent.*
Assumption 3 ensures that an agent can be activated again within finite time after it is activated at for all , which implies that all agents will update their states for infinitely many times in the long run. In fact, Assumption 3 can be satisfied if the agents are activated by mutually independent Poisson clocks or at rates determined by mutually independent Bernoulli processes ([44, Ch. 6], [32, Ch. 2]).
Let denote all event times of all the agents, in which the event times have been relabeled in a way such that and . This idea has been used in [45] and [21] to study asynchronous iterative algorithms. One situation may occur in which there exist some such that and for some , which implies more than one agent is activated at some event times. Although this is not likely to happen when the underlying process is some special random ones like Poisson, our analysis and results will not be affected. For simplicity, we rewrite the set of event times as . Then the system with asynchronous updating can be treated as one with discrete-time dynamics in which the agents are permitted to update only at certain event times , according to the updating rule (23) at each time . Since each can be the event time of any subset of agents, we can associate any set of event times with the updating sequence of agents with . Under Assumption 3, one knows that this updating sequence can be arbitrarily ordered, and each possible sequence can occur with positive probability, though the particular value is not of concern.
Assume at time , agents are activated, labeled by , then we define the following matrices
[TABLE]
where is the th column of the identity matrix and denotes the th row of . We call the asynchronous updating matrix at time . Then the asynchronous updating rule (23) becomes
[TABLE]
where is a random sequence of asynchronous updating matrices which are stochastic, and is a given initial state. We say the asynchronous agreement is reached if converges to a scaled all-one vector when the agents update asynchronously. It suffices to study the convergence of the product to a rank-one matrix. We now show the asynchronous agreement is reached almost surely even when the graph is periodic. A necessary and sufficient condition for the graph is obtained, under which the agreement can always be reached.
Theorem 6**.**
If the agents coupled by a network update asynchronously under Assumption 3, they reach agreement almost surely if and only if the network is rooted, i.e., the matrix is indecomposable.
To prove this theorem, we need to introduce some additional concepts and results. It is equivalent to say the associated graph is rooted if is indecomposable. Denote the set of all the roots of by . We can partition the vertices of into some hierarchical subsets as follows. For any , there must exist at least one directed spanning tree rooted at , see e.g., Fig. 1 (a). We select any of these directed spanning trees, denoted by . There exists a directed path from to any other vertex , see e.g., Fig. 1 (b). Let be the length of the directed path from to , and there exists an integer such that for all . Define
[TABLE]
and . From this definition, one can partition the vertices of into hierarchical subsets, i.e., , according to the vertices’ distances to the root . Let be the number of vertices in the subset , (see the example in Fig. 1 (b)). Note that given a spanning tree, its corresponding hierarchical subsets ’s are uniquely determined.
Definition 5**.**
An updating vertex sequence of length is said to be hierarchical if it can be partitioned into some successive subsequences, denoted by with , such that for all , where ’s are the hierarchical subsets of some spanning tree in .
Proposition 3**.**
If agents coupled by update in a hierarchical sequence for all , the product of the corresponding asynchronous updating matrices, , is a Markov matrix.
To prove this proposition, we define an operator for any stochastic matrix and any subset
[TABLE]
and we write as for brevity. It is easy to check then for any two stochastic matrices and for any subset , it holds that
[TABLE]
Proof:
It suffices to show that all share at least one common neighbor in the graph , i.e.,
[TABLE]
We rewrite the product of asynchronous updating matrices into
[TABLE]
For any distinct , we know that from the definition of asynchronous updating matrices. Then for any , it holds that
[TABLE]
where the property (26) has been used. From Definition 5, one knows that there exists at least one vertex that can reach in and subsequently in , which implies
[TABLE]
It then follows
[TABLE]
Similarly, there hold that
[TABLE]
As a recursion, it must be true that
[TABLE]
where is a root of . In fact, it holds that , and then we know
[TABLE]
Substituting (29) into (28) leads to
[TABLE]
for all . Since , we know
[TABLE]
Straightforwardly, (27) follows, which completes the proof. ∎
Since the hierarchical sequences will appear with positive probability in any sequence of length , one can easily prove the following proposition by letting .
Proposition 4**.**
There exist an integer such that the product , where is given in (25), is a Markov matrix with positive probability for any .
Proof:
We prove the necessity by contradiction. Suppose the matrix is decomposable. Then there are at least two sets of vertices that are isolated from each other. Then agreement will never happen between these two isolated groups if they have different initial states. Let , in view of Corollary 4, the sufficiency follows directly from Proposition 4, which completes the proof. ∎
Remark 3**.**
Note that the hierarchical sequence is a particular type of updating orders that results in a Markov matrix as the product of the corresponding updating matrices. We have identified another type of updating orders in our earlier work when is irreducible and periodic [43]. It is of great interest for future work to look for other updating mechanisms to enable the appearance of Markov matrices or scrambling matrix to guarantee asynchronous agreement.
In the next section, we look into another application in solving linear algebraic equations.
V To Solve Linear Algebraic Equations
Researchers have been quite interested in solving a system of linear algebraic equations in the form of in a distributed way [46, 47, 28, 29]. In this section we deal with the problem under the assumption that this system of equations has at least one solution. The set of equations is decomposed into smaller sets and distributed to a network of processors, referred to as agents, to be solved in parallel. Agents can receive information from their neighbors and the neighbor relationships are described by a time-varying -vertex directed graph with self-arcs. When each agent knows only the pair of real-valued matrices , the problem of interest is to devise local algorithms such that all agents can iteratively compute the same solution to the linear equation , where and . A distributed algorithm to solve the problem is introduced in [30], where the iterative updating rule for each agent is described by
[TABLE]
where , is the number of neighbors of agent at time , is the collection of ’s neighbors, is the orthogonal projection on the kernel of , and the initial value is any solution to the equations of .
The results in [30] have shown that all converge to the same solution exponentially fast if the sequence of graphs is repeatedly jointly strongly connected. This condition is restrictive since it is required that for some integer , the composition of the sequence of graphs, , must be strongly connected for any . By the composition of a directed graph with the vertex set with another directed graph with the same vertex set , denoted by , we mean the directed graph with the vertex set and edge set defined in such a way that is an arc of the composition just in case there is a vertex such that is an edge in and meanwhile is an edge in . It is not so easy to satisfy this condition if the network is changing randomly. Now assume that the evolution of the sequence of graphs is driven by a random process. In this case, results in Theorem 1 and Corollary 1 can be applied to relax the condition in [30] to achieve the following more general result.
Theorem 7**.**
Suppose each agent updates its state according to the rule (30). All states converge to the same solution to almost surely if the following two conditions are satisfied
- a)
there exists an integer such that the composition of any sequence of randomly changing graphs is strongly connected with positive probability for any ;
- b)
there holds
To prove the theorem, we define an error system. Let be any solution to , so for any . Then, we define
[TABLE]
which, as is done in [30], can be simplified into
[TABLE]
Let , be the adjacency matrix of the graph , be the diagonal matrix whose th diagonal entry is , and . It is clear that is a stochastic matrix, and is a stochastic process. Now we write equation (31) into a compact form
[TABLE]
where denotes the Kronecker product, , , and is a random process. We will show this error system is globally a.s. asymptotically stable. Define the transition matrix of this error system by
[TABLE]
In order to study the stability of the error system (32), we define a mixed-matrix norm for an block matrix whose th entry is a matrix , and
[TABLE]
where is the matrix in whose th entry is . Here and denote the induced 2 norm and infinity norm, respectively. It is easy to show that is a norm. Since for , it follows straightforwardly that . It has been proven in [30] that is non-expansive for any . In other words, it holds that . Moreover, the transition matrix is a contraction, i.e., , if there exists a “route” over the sequence for any that satisfies ; here by a route over a given sequence of graphs , we mean a sequence of vertices such that is an edge in for all . Now we are ready to prove Theorem 7.
Proof:
Let be a finite-step stochastic Lyapunov function candidate. Let , where , be an increasing sequence of -fields. We first show that is a supermartingale with respect to by observing
[TABLE]
where . The last inequality follows from the fact that since all the possible are non-expansive. Consider the sequence of randomly changing graphs , where . Let , and partition this sequence into successive subsequences , ,, . Let denote the composition of the graphs in the th subsequence, i.e., . Since all the subsequences have the length , each can be further partitioned into successive sub-subsequences of length . From the condition of Theorem 7, one knows that the composition of the graphs in any sub-subsequence has positive probability to be strongly connected. The event that the composition of the graphs in each of the sub-subsequences in is strongly connected also has positive probability. This holds for all . We know that the composition of any or more strongly connected graphs, within which each vertex has a self-arc, results in a complete graph [20]. It follows straightforwardly that the graphs have positive probability to be all complete. Therefore, for any pair , there exists a route from to over the graph for any . It is easy to check that there exists a route over the graphs , where can be any reordered sequence of . Similarly, for any there must exist a route of length , , over . Thus there is a route over the graph sequence so that . This implies that the probability that being a contraction is positive. Since all are non-expansive, there is a number such that . Straightforwardly, it also holds for all . Thus there a.s. holds that
[TABLE]
Similarly as in the proof of Theorem 3, the condition b) in Theorem 7 ensures that . It follows that as since V(e_{0})-\mathbb{E}\big{[}{\left.{V\left({e_{nq}}\right)}\right|{\mathcal{F}_{k}}}\big{]}<\infty for any . Define the set for any initial corresponding to . For any random sequence , it follows from the system dynamics (32) that
[TABLE]
and thus will stay within the set with probability . From Theorem 1 and Corollary 1, it follows that asymptotically converges to almost surely. Moreover, since is a norm of , it can be concluded from Corollary 1 that the error system (32) is globally a.s. asymptotically stable. The proof is complete. ∎
It is worth mentioning that the error system is globally a.s. exponentially stable under the assumption that the probability of the composition of any sequence of randomly-changing graphs, , for any , being strongly connected is lower bounded by some positive number. This can be proven with the help of Theorem 2 and Corollary 2.
VI Concluding Remarks
We have established the tool of finite-step stochastic Lyapunov functions, using which one can study the convergence and stability of a stochastic system together with its convergence rate. As applications, we investigate the convergence of the products of a random sequence of stochastic matrices. The asynchronous agreement problem and the distributed algorithm for solving linear algebraic equations have also been studied. Conditions in the existing results on both of these problems have been relaxed. One of our future research directions is to apply finite-step stochastic Lyapunov functions to the study of stochastic distributed optimization.
VII Acknowledgement
We thank Prof. Tobias Müller from Bernoulli Institute, University of Groningen, for constructive discussions.
Appendix A An Alternative Proof of Corollary 5
For ergodic stationary sequences, the following important property is the key to construct the convergence rate.
Lemma 6** (Birkhoff’s Ergodic Theorem, see [36, Th. 7.2.1]).**
For an ergodic sequence , of random variables, it holds that
[TABLE]
For the product given in (9), we say converges to a rank-one matrix a.s. as if as , where is defined in (10). According to Definition 1, if there exist such that
[TABLE]
then the convergence rate is said to be exponential at the rate no slower than . We are now ready to present the proof of Corollary 5.
Proof of Corollary 5.
Let be the same as that in Assumption 2. There is an integer such that is scrambling with positive probability. Let . Consider a sufficiently large , and then can be written as
[TABLE]
where is the largest integer such that , , are the matrix products defined by (9), and is the remaining part, which is obviously a stochastic matrix. To study the limiting behavior of , we compute its coefficients of ergodicity
[TABLE]
where the property (12) has been used. The last inequality follows from the property of coefficients of ergodicity, i.e., for a stochastic matrix . Taking logarithms yields that
[TABLE]
Since the sequence is ergodic, it is easy to see that the sequence of products , , over non-overlapping intervals of length , is also ergodic. It follows in turn that \{\log\tau\big{(}W\left({kT+T,kT}\right)\big{)}\} is ergodic. From Lemma 6, one can further obtain
[TABLE]
The last inequality follows from Jensen’s inequality (see [36, Th. 1.5.1]) since is concave. According to Assumption 1, one knows that is scrambling with positive probability, and thus it follows that 0<\mathbb{E}\big{[}\tau\left({W\left({{T},0}\right)}\right)\big{]}<1. Taking a positive number satisfying \lambda<-\log\mathbb{E}\big{[}\tau\big{(}{W\left({{T},0}\right)}\big{)}\big{]}, one obtains
[TABLE]
Adding to both sides of (35) yields that
[TABLE]
It follows straightforwardly that
[TABLE]
Let , which apparently satisfies . From Definition 1, one can conclude that the product almost surely converges to a rank-one stochastic matrix exponentially at a rate no slower than , which completes the proof. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. J. Kushner, Stochastic Stability and Control . New York, NY, USA: Academic Press, 1967.
- 2[2] ——, Introduction to Stochastic Control . New York: Holt, Rinehart and Winston, Inc., 1971.
- 3[3] ——, “On the stability of stochastic dynamical systems,” Proceedings of the National Academy of Sciences , vol. 53, no. 1, pp. 8–12, 1965.
- 4[4] F. J. Beutler, “On two discrete-time system stability concepts and supermartingales,” Journal of Mathematical Analysis and Applications , vol. 44, no. 2, pp. 464–471, 1973.
- 5[5] R. Khasminskii, Stochastic Stability of Differential Equations . Springer Science & Business Media, 2011.
- 6[6] M. Porfiri and D. J. Stilwell, “Consensus seeking over random weighted directed graphs,” IEEE Trans. Autom. Control , vol. 52, no. 9, pp. 1767–1773, 2007.
- 7[7] A. Tahbaz-Salehi and A. Jadbabaie, “Consensus over ergodic stationary graph processes,” IEEE Trans. Autom. Control , vol. 55, no. 1, pp. 225–230, 2010.
- 8[8] S. Lee, A. Nedić, and M. Raginsky, “Stochastic dual averaging for decentralized online optimization on time-varying communication graphs,” IEEE Trans. Autom. Control , vol. 62, no. 12, pp. 6407–6414, 2017.
