Lyapunov Criterion for Stochastic Systems and Its Applications in   Distributed Computation

Yuzhen Qin; Ming Cao; Brian D. O. Anderson

arXiv:1902.04332·cs.SY·June 5, 2019

Lyapunov Criterion for Stochastic Systems and Its Applications in Distributed Computation

Yuzhen Qin, Ming Cao, Brian D. O. Anderson

PDF

TL;DR

This paper introduces a new Lyapunov criterion for stochastic systems that guarantees convergence and stability, with applications in analyzing random matrix products and distributed algorithms for solving linear equations.

Contribution

It proposes a novel Lyapunov condition allowing finite-step expected decrease without strict per-step decrease, extending classical stochastic stability theory.

Findings

01

Conditions for almost sure convergence of random matrix products.

02

Exponential convergence rate under additional assumptions.

03

Relaxed network structure requirements for distributed linear algebra algorithms.

Abstract

This paper presents new sufficient conditions for convergence and asymptotic or exponential stability of a stochastic discrete-time system, under which the constructed Lyapunov function always decreases in expectation along the system's solutions after a finite number of steps, but without necessarily strict decrease at every step, in contrast to the classical stochastic Lyapunov theory. As the first application of this new Lyapunov criterion, we look at the product of any random sequence of stochastic matrices, including those with zero diagonal entries, and obtain sufficient conditions to ensure the product almost surely converges to a matrix with identical rows; we also show that the rate of convergence can be exponential under additional conditions. As the second application, we study a distributed network algorithm for solving linear algebraic equations. We relax existing…

Equations178

x_{k + 1} = f (x_{k}, y_{k + 1}),

x_{k + 1} = f (x_{k}, y_{k + 1}),

E [V (x_{k + 1}) ∣ x_{k}] - V (x_{k}) \leq - φ (x_{k}), \forall k,

E [V (x_{k + 1}) ∣ x_{k}] - V (x_{k}) \leq - φ (x_{k}), \forall k,

E [V (x_{k + 1}) ∣ x_{k}] - V (x_{k}) \leq - α V (x_{k}),

E [V (x_{k + 1}) ∣ x_{k}] - V (x_{k}) \leq - α V (x_{k}),

\displaystyle A_{1}=\left[{\begin{array}[]{*{20}{c}}{0.2}&{0}\\ {0}&{1}\end{array}}\right],A_{2}=\left[{\begin{array}[]{*{20}{c}}{1}&{0}\\ {0}&{0.8}\end{array}}\right],A_{3}=\left[{\begin{array}[]{*{20}{c}}{1}&{0}\\ {0}&{0.6}\end{array}}\right].

\displaystyle A_{1}=\left[{\begin{array}[]{*{20}{c}}{0.2}&{0}\\ {0}&{1}\end{array}}\right],A_{2}=\left[{\begin{array}[]{*{20}{c}}{1}&{0}\\ {0}&{0.8}\end{array}}\right],A_{3}=\left[{\begin{array}[]{*{20}{c}}{1}&{0}\\ {0}&{0.6}\end{array}}\right].

\displaystyle\pi=\left[{\begin{array}[]{*{20}{c}}{0}&{0.4}&{0.6}\\ {1}&{0}&{0}\\ {1}&{0}&{0}\end{array}}\right],

\displaystyle\pi=\left[{\begin{array}[]{*{20}{c}}{0}&{0.4}&{0.6}\\ {1}&{0}&{0}\\ {1}&{0}&{0}\end{array}}\right],

Pr [sup_{k \in N} V (x_{k}) \geq λ] \leq V (x_{0}) / λ .

Pr [sup_{k \in N} V (x_{k}) \geq λ] \leq V (x_{0}) / λ .

\tilde{z}_{k} = x_{k}, k < J - T,

\tilde{z}_{k} = x_{k}, k < J - T,

\displaystyle\mathbb{E}\left[V\big{(}\tilde{z}_{k+T}\big{)}\right]-\mathbb{E}V\big{(}\tilde{z}_{k}\big{)}\leq-\mathbb{E}\tilde{\varphi}\big{(}\tilde{z}_{k}\big{)},k\in\mathbb{N}_{0}.

\displaystyle\mathbb{E}\left[V\big{(}\tilde{z}_{k+T}\big{)}\right]-\mathbb{E}V\big{(}\tilde{z}_{k}\big{)}\leq-\mathbb{E}\tilde{\varphi}\big{(}\tilde{z}_{k}\big{)},k\in\mathbb{N}_{0}.

\displaystyle\mathbb{E}\left[V\big{(}\tilde{z}_{pT+j}\big{)}\right]-\mathbb{E}V\big{(}\tilde{z}_{(p-1)T+j}\big{)}\leq-\mathbb{E}\tilde{\varphi}\big{(}\tilde{z}_{(p-1)T+q}\big{)},\;\;

\displaystyle\mathbb{E}\left[V\big{(}\tilde{z}_{pT+j}\big{)}\right]-\mathbb{E}V\big{(}\tilde{z}_{(p-1)T+j}\big{)}\leq-\mathbb{E}\tilde{\varphi}\big{(}\tilde{z}_{(p-1)T+q}\big{)},\;\;

j = 1, \dots, q;

\displaystyle\mathbb{E}\left[V\big{(}\tilde{z}_{iT+m}\big{)}\right]-\mathbb{E}V\big{(}\tilde{z}_{(i-1)T+m}\big{)}\leq-\mathbb{E}\tilde{\varphi}\big{(}\tilde{z}_{(i-1)T+m}\big{)},

i = 1, \dots, p - 1, m = 0, \dots, T - 1

\displaystyle\sum_{m=0}^{T-1}\Big{(}\mathbb{E}\big{[}V(\tilde{z}_{(p-1)T+m}-\mathbb{E}V(\tilde{z}_{m}\big{)}\big{]}\Big{)}+\sum_{j=1}^{q}\Big{(}\mathbb{E}\big{[}V(

\displaystyle\sum_{m=0}^{T-1}\Big{(}\mathbb{E}\big{[}V(\tilde{z}_{(p-1)T+m}-\mathbb{E}V(\tilde{z}_{m}\big{)}\big{]}\Big{)}+\sum_{j=1}^{q}\Big{(}\mathbb{E}\big{[}V(

\displaystyle\mathbb{E}V(\tilde{z}_{(p-1)T+j}\big{)}\big{]}\Big{)}\leq-\sum_{i=1}^{k-T}\mathbb{E}\tilde{\varphi}\big{(}\tilde{z}_{i}\big{)}.

Pr [φ (x_{k}) \to 0∣ \overset{ˉ}{Ω}] = Pr [\tilde{φ} (\tilde{z}_{k}) \to 0∣ \overset{ˉ}{Ω}] = 1.

Pr [φ (x_{k}) \to 0∣ \overset{ˉ}{Ω}] = Pr [\tilde{φ} (\tilde{z}_{k}) \to 0∣ \overset{ˉ}{Ω}] = 1.

Pr [k \in N sup h_{1} (∥ x_{k} ∥) \geq λ_{1}] \leq Pr [k \in N sup V (x_{k}) \geq λ_{1}] \leq \frac{V ( x _{0} )}{λ _{1}}

Pr [k \in N sup h_{1} (∥ x_{k} ∥) \geq λ_{1}] \leq Pr [k \in N sup V (x_{k}) \geq λ_{1}] \leq \frac{V ( x _{0} )}{λ _{1}}

E [V (x_{k + T}) ∣ F_{k}] - V (x_{k}) \leq - α V (x_{k}),

E [V (x_{k + T}) ∣ F_{k}] - V (x_{k}) \leq - α V (x_{k}),

\overset{z}{^}_{k} = x_{k}, k < J - T,

\overset{z}{^}_{k} = x_{k}, k < J - T,

E [(1 - α)^{- (m + 1)} Y_{m + 1}^{r} ∣ G_{m}^{(r)}] - (1 - α)^{- m} Y_{m}^{(r)} \leq 0.

E [(1 - α)^{- (m + 1)} Y_{m + 1}^{r} ∣ G_{m}^{(r)}] - (1 - α)^{- m} Y_{m}^{(r)} \leq 0.

Pr [k \to \infty lim γ^{k} V (x_{k})

Pr [k \to \infty lim γ^{k} V (x_{k})

= Pr [k \to \infty lim γ^{k} V (\overset{z}{^}_{k}) \leq \overset{ˉ}{Y} ∣ \overset{ˉ}{Ω}] = 1.

E [V (x_{k + T}) ∣ x_{k}, y_{k} = 1] - V (x_{k})

E [V (x_{k + T}) ∣ x_{k}, y_{k} = 1] - V (x_{k})

\displaystyle={0.5}{\left\|{\begin{array}[]{*{20}{c}}{0.2x_{k}^{1}}\\ {0.8x_{k}^{2}}\end{array}}\right\|_{\infty}}+{0.5}{\left\|{\begin{array}[]{*{20}{c}}{0.2x_{k}^{1}}\\ {0.6x_{k}^{2}}\end{array}}\right\|_{\infty}}-{\left\|{\begin{array}[]{*{20}{c}}{x_{k}^{1}}\\ {x_{k}^{2}}\end{array}}\right\|_{\infty}}

\leq - 0.3 V (x_{k}), \forall x_{k} \in R^{2} .

E [V (x_{k + T}) ∣ x_{k}, y_{k}] - V (x_{k}) \leq - 0.3 V (x_{k}), \forall x_{k} \in R^{2} .

E [V (x_{k + T}) ∣ x_{k}, y_{k}] - V (x_{k}) \leq - 0.3 V (x_{k}), \forall x_{k} \in R^{2} .

W (t + k, t) = W (t + k) \dots W (t + 1),

W (t + k, t) = W (t + k) \dots W (t + 1),

τ (A) = 1 - i, j min s = 1 \sum n min (a_{i s}, a_{j s}) .

τ (A) = 1 - i, j min s = 1 \sum n min (a_{i s}, a_{j s}) .

τ (A) < 1

τ (A) < 1

τ (A B) \leq τ (A) τ (B) .

τ (A B) \leq τ (A) τ (B) .

Pr [W (k + h, k) \in M_{2}] > 0

Pr [W (k + h, k) \in M_{2}] > 0

\sum\limits_{i=1}^{\infty}{\Pr\left[{W\big{(}{k+ih,k+\left({i-1}\right)h}\big{)}}\in\mathcal{M}_{2}\right]}=\infty,\forall k.

\sum\limits_{i=1}^{\infty}{\Pr\left[{W\big{(}{k+ih,k+\left({i-1}\right)h}\big{)}}\in\mathcal{M}_{2}\right]}=\infty,\forall k.

x_{k + 1} = W_{y (k + 1)} x_{k} := W (k + 1) x_{k},

x_{k + 1} = W_{y (k + 1)} x_{k} := W (k + 1) x_{k},

v_{k} = ⌈ x_{k} ⌉ - ⌊ x_{k} ⌋ .

v_{k} = ⌈ x_{k} ⌉ - ⌊ x_{k} ⌋ .

x_{k + 1}^{i} = l = 1 \sum n a_{i l} x_{k}^{l} = l \in i \sum a_{i l} x_{k}^{l} = 1,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Lyapunov Criterion for Stochastic Systems and Its Applications in Distributed Computation

Yuzhen Qin, Ming Cao,

and Brian D. O. Anderson Y. Qin and M. Cao are with the Institute of Engineering and Technology, Faculty of Science and Engineering, University of Groningen, Groningen, the Netherlands ({y.z.qin, m.cao}@rug.nl). B.D.O. Anderson is with School of Automation, Hangzhou Dianzi University, Hangzhou, 310018, China, and Data61-CSIRO and Research School of Engineering, Australian National University, Canberra, ACT 2601, Australia ([email protected]). The work of Cao was supported in part by the European Research Council (ERC-CoG-771687) and the Netherlands Organization for Scientific Research (NWO-vidi-14134). The work of B.D.O. Anderson was supported by the Australian Research Council (ARC) under grants DP-130103610 and DP-160104500, and by Data61-CSIRO.

Abstract

This paper presents new sufficient conditions for convergence and asymptotic or exponential stability of a stochastic discrete-time system, under which the constructed Lyapunov function always decreases in expectation along the system’s solutions after a finite number of steps, but without necessarily strict decrease at every step, in contrast to the classical stochastic Lyapunov theory. As the first application of this new Lyapunov criterion, we look at the product of any random sequence of stochastic matrices, including those with zero diagonal entries, and obtain sufficient conditions to ensure the product almost surely converges to a matrix with identical rows; we also show that the rate of convergence can be exponential under additional conditions. As the second application, we study a distributed network algorithm for solving linear algebraic equations. We relax existing conditions on the network structures, while still guaranteeing the equations are solved asymptotically.

I Introduction

Stability analysis for stochastic dynamical systems has always been an active research field. Early works have shown that stochastic Lyapunov functions play an important role, and to use them for discrete-time systems, a standard procedure is to show that they decrease in expectation at every time step [1, 2, 3, 4]. Properties of supermartingales and LaSalle’s arguments are critical to establish the related proofs. However, most of the stochastic stability results are built upon a crucial assumption, which requires that the stochastic dynamical system under study is Markovian (see e.g., [1, 2, 3, 5]), and very few of them have reported bounds for the convergence speed.

More recently, with the fast development of network algorithms, more and more distributed computational processes are carried out in networks of coupled computational units. Such dynamical processes are usually modeled by stochastic discrete-time dynamical systems since they are usually under inevitable influences from random changes of network structures [6, 7, 8, 9], communication delay and noise [10, 11, 12], and asynchronous updating events [13, 14]. So there is great need in further developing Lyapunov theory for stochastic dynamical systems, in particular in the setting of network algorithms for distributed computation. And this is exactly the aim of this paper.

We aim at further developing the Lyapunov criterion for stochastic discrete-time systems. Motivated by the concept of finite-step Lyapunov functions for deterministic systems [15, 16, 17], we propose to define a finite-step stochastic Lyapunov function, which decreases in expectation, not necessarily at every step, but after a finite number of steps. The associated new Lyapunov criterion not only enlarges the range of choices of candidate Lyapunov functions but also implies that the systems that it can be used to analyze do not need to be Markovian. An additional advantage of using this new criterion is that we are enabled to construct conditions to guarantee exponential convergence and estimate convergence rates.

We then apply the finite-step stochastic Lyapunov function to study two distributed computation problems arising in some popular network algorithmic settings. In distributed optimization [18, 19] and other distributed coordination algorithms [20, 21, 22, 7], one frequently encounters the need to prove convergence of inhomogeneous Markov chains, or equivalently the convergence of backward products of random sequences of stochastic matrices $\{W(k)\}$ . Most of the existing results assume exclusively that all the $W(k)$ in the sequence have all positive diagonal entries, see e.g., [23, 24, 25]. This assumption simplifies the analysis of convergence significantly; moreover, without this assumption, the existing results do not always hold. For example, from [22, 7] one knows that the product of $W(k)$ converges to a rank-one matrix almost surely if exactly one of the eigenvalues of the expectation of $W(k)$ has the modulus of one, which can be violated if $W(k)$ has zero diagonal elements. Note also that most of the existing results are confined to special random sequences, e.g., independently distributed sequences [22], stationary ergodic sequences [7], or independent sequences [26, 27]. Using the new Lyapunov criterion in this paper, we work on more general classes of random sequences of stochastic matrices without the assumption of non-zero diagonal entries. We show that if there exists a fixed length such that the product of any successive subsequence of matrices of this length has the scrambling property (a standard concept, but it will be defined subsequently) with positive probability, the convergence to a rank-one matrix for the infinite product can be guaranteed almost surely. We also prove that the convergence can be exponentially fast if this probability is lower bounded by some positive number, and the greater the lower bound is, the faster the convergence becomes. For some particular random sequences, we further relax this “scrambling” condition. If the random sequence is driven by a stationary process, the almost sure convergence can be ensured as long as the product of any successive subsequence of finite length has positive probability to be indecomposable and aperiodic (SIA). The exponential convergence rate follows without other assumptions if the random process that governs the evolution of the sequence is a stationary ergodic process.

As the second application of the finite-step stochastic Lyapunov functions, we investigate a distributed algorithm for solving linear algebraic equations of the form $Ax=b$ . The equations are solved in parallel by $n$ agents, each of whom just knows a subset of the rows of the matrix $[A,b]$ . Each agent recursively updates its estimate of the solution using the current estimates from its neighbors. Recently several solutions under different sufficient conditions have been proposed [28, 29, 30], and in particular in [30], the sequence of the neighbor relationship graphs $\mathcal{G}(k)$ is required to be repeated jointly strongly connected. We show that a much weaker condition is sufficient to solve the problem almost surely, namely the algorithm in [30] works if there exists a fixed length such that any subsequence of $\{\mathcal{G}(k)\}$ at this length is jointly strongly connected with positive probability.

The remainder of this paper is organized as follows. In Section II, we define the finite-step stochastic Lyapunov functions. Products of random sequences of stochastic matrices are studied in Section III; in Section IV we look into in particular the asynchronous implementation issues as an application of Section III. Finally, we study in Section V a distributed approach for solving linear equations. Brief concluding remarks appear in Section VI.

Notation: Throughout this paper, $\mathbb{N}_{0}$ denotes the sets of non-negative integers, $\mathbb{N}$ the collection of positive integers, and $\mathbb{R}^{q}$ the real $q$ -dimensional vector space. Moreover, we let $\mathbf{1}$ be the vector consisting of all ones, and let $\mathbf{N}=\{1,2,\dots,n\}$ . Given a vector $x\in\mathbb{R}^{n}$ , $x^{i}$ denotes the $i$ th element of $x$ . Let $\left\|\cdot\right\|$ , $p\geq 1$ , be any $p$ -norm. A continuous function $h(x):[0,a)\to[0,\infty)$ is said to belong to class $\mathcal{K}$ if it is strictly increasing and $h(0)=0$ . For any two events $A,B$ , the conditional probability $\Pr[A|B]$ denotes the probability of $A$ given $B$ .

II Finite-Step Stochastic Lyapunov Functions

Consider a stochastic discrete-time system described by

[TABLE]

where $x_{k}\in\mathbb{R}^{n}$ , and $\{y_{k}:k\in\mathbb{N}\}$ is a $\mathbb{R}^{d}$ -valued stochastic process on a probability space $(\Omega,\mathcal{F},\Pr)$ . Here $\Omega=\{\omega\}$ is the sample space; $\mathcal{F}$ is a set of events which is a $\sigma$ -field; $\Pr:\mathcal{F}\to[0,1]$ is a function that assigns probabilities to events; $y_{k}$ is a measurable function mapping $\Omega$ into the state space $\Omega_{0}\subseteq\mathbb{R}^{d}$ , and for any $\omega\in\Omega$ , $\{y_{k}(\omega):k\in\mathbb{N}\}$ is a realization of the stochastic process $\{y_{k}\}$ at $\omega$ . Let $\mathcal{F}_{k}=\sigma(y_{1},\dots,y_{k})$ for $k\geq 1$ , $\mathcal{F}_{0}=\{\emptyset,\Omega\}$ , so that evidently $\{\mathcal{F}_{k}\},k=1,2,\dots,$ is an increasing sequence of $\sigma$ -fields. Following [31], we consider a constant initial condition $x_{0}\in\mathbb{R}^{n}$ with probability one. It then can be observed that the solution to (1), $\{x_{k}\}$ , is a $\mathbb{R}^{n}$ -valued stochastic process adapted to $\mathcal{F}_{k}$ . The randomness of $y_{k}$ can be due to various reasons, e.g., stochastic disturbances or noise. Note that (1) becomes a stochastic switching system if $f(x,y)=g_{y}(x)$ , where $y$ maps $\Omega$ into the set $\Omega_{0}:=\{1,\dots,p\}$ , and $\{g_{p}(x):\mathbb{R}^{n}\to\mathbb{R}^{n},p\in\Omega_{0}\}$ is a given family of functions.

A point $x^{*}$ is said to be an equilibrium of system (1) if $f(x^{*},y)=x^{*}$ for any $y\in\Omega_{0}$ . Without loss of generality, we assume that the origin $x=0$ is an equilibrium. Researchers have been interested in studying the limiting behavior of the solution $\{x_{k}\}$ , i.e., when and to where $x_{k}$ converges as $k\to\infty$ . Most noticeably, Kushner developed classic results on stochastic stability by employing stochastic Lyapunov functions [1, 2, 3]. We introduce some related definitions before recalling some Kushner’s results. Following [32, Sec. 1.5.6] and [33], we first define convergence and exponential convergence of a sequence of random variables.

Definition 1 (Convergence).

A random sequence $\{x_{k}\in\mathbb{R}^{n}\}$ in a sample space $\Omega$ converges to a random variable $x$ almost surely if $\Pr\left[\omega\in\Omega:\lim_{k\to\infty}\|x_{k}(\omega)-x\|=0\right]=1.$ The convergence is said to be exponentially fast with a rate no slower than $\gamma^{-1}$ for some $\gamma>1$ independent of $\omega$ if $\gamma^{k}\|x_{k}-x\|$ almost surely converges to $y$ for some finite $y\geq 0$ . Furthermore, let $\mathcal{D}\subset\mathbb{R}^{n}$ be a set; a random sequence $\{x_{k}\}$ is said to converge to $\mathcal{D}$ almost surely if $\Pr\left[\omega\in\Omega:\lim_{k\to\infty}{\rm dist}(x_{k}(\omega),\mathcal{D})=0\right]=1,$ where ${\rm dist\;}(x,\mathcal{D}):=\inf_{y\in\mathcal{D}}\|x-y\|$ .

Here “almost surely” is exchangeable with “with probability one”, and we sometimes use the shorthand notation “a.s.”. We now introduce some stability concepts for stochastic discrete-time systems analogous to those in [5] and [34] for continuous-time systems111Note that 1) and 2) of Definition 2 follow from the definitions in [5, Chap. 5], in which an arbitrary initial time $s$ rather than just [math] is actually considered. We define 3) following the same lines as 1) and 2). In Definition 3, 1) follows from the definitions in [34], and we define 2) following the same lines as 1). .

Definition 2.

The origin of (1) is said to be:

1) stable in probability* if $\lim\nolimits_{x_{0}\to 0}\Pr\left[\sup\nolimits_{k\in\mathbb{N}}\|x_{k}\|>\varepsilon\right]=0$ for any $\varepsilon>0$ ;*

2) asymptotically stable in probability* if it is stable in probability and moreover $\lim\nolimits_{x_{0}\to 0}\Pr\left[\lim\nolimits_{k\to\infty}\|x_{k}\|=0\right]=1$ ;*

3) exponentially stable in probability* if for some $\gamma>1$ independent of $\omega$ , $\lim\nolimits_{x_{0}\to 0}\Pr\left[\lim\nolimits_{k\to\infty}\|\gamma^{k}x_{k}\|=0\right]=1$ ;*

Definition 3.

For a set $\mathcal{Q}\subseteq\mathbb{R}^{n}$ containing the origin, the origin of (1) is said to be:

1) locally a.s. asymptotically stable in $\mathcal{Q}$ (globally a.s. asymptotically stable, respectively)* if starting from $x_{0}\in\mathcal{Q}$ ( $x_{0}\in\mathbb{R}^{n}$ , respectively) all the sample paths $x_{k}$ stay in $\mathcal{Q}$ ( $\mathbb{R}^{n}$ , respectively) for all $k\geq 0$ and converge to the origin almost surely;*

2) locally a.s. exponentially stable in $\mathcal{Q}$ (globally a.s. exponentially stable, respectively)* if it is locally (globally, respectively) a.s. asymptotically stable and the convergence is exponentially fast.*

Now let us recall some Kushner’s results on convergence and stability, where stochastic Lyapunov functions have been used.

Lemma 1 (Asymptotic Convergence and Stability).

For the stochastic discrete-time system (1), let $\{x_{k}\}$ be a Markov process. Let $V:\mathbb{R}^{n}\to\mathbb{R}$ be a continuous positive definite and radially unbounded function. Define the set $\mathcal{Q}_{\lambda}:=\{x:0\leq V(x)<\lambda\}$ for some $\lambda>0$ , and assume that

[TABLE]

where $\varphi:\mathbb{R}^{n}\to\mathbb{R}$ is continuous and satisfies $\varphi(x)\geq 0$ for any $x\in\mathcal{Q}_{\lambda}$ . Then the following statements apply:

i)* for any initial condition $x_{0}\in\mathcal{Q}_{\lambda}$ , $x_{k}$ converges to $\mathcal{D}_{1}:=\{x\in\mathcal{Q}_{\lambda}:\varphi(x)=0\}$ with probability at least $1-V(x_{0})/\lambda$ [3];*

ii)* if moreover $\varphi(x)$ is positive definite on $\mathcal{Q}_{\lambda}$ , and $h_{1}\left(\|s\|\right)\leq V(s)\leq h_{2}\left(\|s\|\right)$ for two class $\mathcal{K}$ functions $h_{1}$ and $h_{2}$ , then $x=0$ is asymptotically stable in probability [3], [35, Theorem 7.3].*

Lemma 2 (Exponential Convergence and Stability).

For the stochastic discrete-time system (1), let $\{x_{k}\}$ be a Markov process. Let $V:\mathbb{R}^{n}\to\mathbb{R}$ be a continuous nonnegative function. Assume that

[TABLE]

Then the following statements apply:

i)* for any given $x_{0}$ , $V(x_{k})$ almost surely converges to [math] exponentially fast with a rate no slower than $1-\alpha$ [2, Th. 2, Chap. 8], [35];*

ii)* if moreover $V$ satisfies $c_{1}\|x\|^{a}\leq V(x)\leq c_{2}\|x\|^{a}$ for some $c_{1},c_{2},a>0$ , then $x=0$ is globally a.s. exponentially stable [35, Theorem 7.4].*

To use these two lemmas to prove asymptotic (or exponential) stability for a stochastic system, the critical step is to find a stochastic Lyapunov function such that (2) (respectively, (3)) holds. However, it is not always obvious how to construct such a stochastic Lyapunov function. We use the following toy example to illustrate this point.

Example 1. Consider a randomly switching system described by $x_{k}=A_{y_{k}}x_{k-1}$ , where $y_{k}$ is the switching signal taking values in a finite set $\mathcal{P}:=\{1,2,3\}$ ,and

[TABLE]

The stochastic process $\{y_{k}\}$ is described by a Markov chain with initial distribution $v=\{v_{1},v_{2},v_{3}\}$ . The transition probabilities are described by a transition matrix

[TABLE]

whose $ij$ th element is defined by $\pi_{ij}=\Pr[y_{k+1}=j|y_{k}=i]$ . Since $\{y_{k}\}$ is not independent and identically distributed, the process $\{x_{k}\}$ is not Markovian. Nevertheless, we might conjecture that the origin is globally a.s. exponentially stable. In order to try to prove this, we might choose a stochastic Lyapunov function candidate $V(x)=\left\|x\right\|_{\infty}$ , but the existing results introduced in Lemma 2 cannot be used since $\{x_{k}\}$ is not Markovian. Moreover, by calculation we observe that $\mathbb{E}\left[{\left.{V\left({x_{k+1}}\right)}\right|x_{k},y_{k}}\right]\leq V\left({x_{k}}\right)$ for any $y_{k}$ , which implies that (3) is not necessarily satisfied. Thus $V(x)$ is not an appropriate stochastic Lyapunov function for which Lemma 2 can be applied. As it turns out however, the same $V(x)$ can be used as a Lyapunov function to establish exponentially stability via the alternative criterion set out subsequently. $\Box$

It is difficult, if not impossible, to construct a stochastic Lyapunov function, especially when the state of the system is not Markovian. So it is of great interest to generalize the results in Lemmas 1 and 2 such that the range of choices of candidate Lyapunov functions can be enlarged. For deterministic systems, Aeyels et al. have introduced a new Lyapunov criterion to study asymptotic stability of continuous-time systems [15]; a similar criterion has also been obtained for discrete-time systems, and the Lyapunov functions satisfying this criterion are called finite-step Lyapunov functions [16, 17]. A common feature of these works is that the Lyapunov function is required to decrease along the system’s solutions after a finite number of steps, but not necessarily at every step. We now use this idea to construct stochastic finite-step Lyapunov functions, a task which is much more challenging compared to the deterministic case due to the uncertainty present in stochastic systems. The tools for analysis are totally different from what are used for deterministic systems. We will exploit supermartingales and their convergence property, as well as the Borel-Cantelli Lemma; these concepts are introduced in the two following lemmas.

Lemma 3 ([36, Sec. 5.2.9]).

Let the sequence $\{X_{k}\}$ be a nonnegative supermartingale with respect to $\mathcal{F}_{k}=\sigma(X_{1},\dots,X_{k})$ , i.e., suppose: (i) $\mathbb{E}X_{n}<\infty$ ; (ii) $X_{k}\in\mathcal{F}_{k}$ for all $k$ ; (iii) $\mathbb{E}\left({\left.{{X_{k+1}}}\right|{\mathcal{F}_{k}}}\right)\leq{X_{k}}$ . Then there exists some random $X$ such that $X_{k}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}X,k\to\infty$ , and $\mathbb{E}X\leq\mathbb{E}X_{0}$ .

Lemma 4 (Borel-Cantelli Lemma, [2, P.192]).

Let $\{X_{k}\}$ be a nonnegative random sequence. If $\sum_{k=0}^{\infty}\mathbb{E}X_{k}<\infty$ , then $X_{k}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0$ .

We are now ready to present our first main result on stochastic convergence and stability.

Theorem 1.

For the stochastic discrete-time system (1), let $V:\mathbb{R}^{n}\to\mathbb{R}$ be a continuous nonnegative and radially unbounded function. Define the set $\mathcal{Q}_{\lambda}:=\{x:V(x)<\lambda\}$ for some $\lambda>0$ , and assume that

a)* $\mathbb{E}\left[V\left(x_{k+1}\right)|\mathcal{F}_{k}\right]-V\left(x_{k}\right)\leq 0$ for any $k$ such that $x_{k}\in\mathcal{Q}_{\lambda}$ ;*

b)* there is an integer $T\geq 1$ , independent of $\omega$ , such that for any $k$ , $\mathbb{E}\left[V\left(x_{k+T}\right)|\mathcal{F}_{k}\right]-V\left(x_{k}\right)\leq-\varphi(x_{k})$ , where $\varphi:\mathbb{R}^{n}\to\mathbb{R}$ is continuous and satisfies $\varphi(x)\geq 0$ for any $x\in\mathcal{Q}_{\lambda}$ .

Then the following statements apply:*

i)* for any initial condition $x_{0}\in\mathcal{Q}_{\lambda}$ , $x_{k}$ converges to $\mathcal{D}_{1}:=\{x\in\mathcal{Q}_{\lambda}:\varphi(x)=0\}$ with probability at least $1-V(x_{0})/\lambda$ ;*

ii)* if moreover $\varphi(x)$ is positive definite on $\mathcal{Q}_{\lambda}$ , and $h_{1}\left(\|s\|\right)\leq V(s)\leq h_{2}\left(\|s\|\right)$ for two class $\mathcal{K}$ functions $h_{1}$ and $h_{2}$ , then $x=0$ is asymptotically stable in probability.*

Proof.

Before proving i) and ii), we first show that starting from $x_{0}\in\mathcal{Q}_{\lambda}$ the sample paths $x_{k}(\omega)$ stay in $\mathcal{Q}_{\lambda}$ with probability at least $1-V(x_{0})/\lambda$ if Assumption a) is satisfied. This has been proven in [2, p. 196] by showing that

[TABLE]

Let $\bar{\Omega}$ be a subset of the sample space $\Omega$ such that for any $\omega\in\bar{\Omega}$ , $x_{k}(\omega)\in\mathcal{Q}_{\lambda}$ for all $k$ . Let $J$ be the smallest $k\in\mathbb{N}$ (if it exists) such that $V(x_{k})\geq\lambda$ . Note that, this integer $J$ does not exist when $x_{k}(\omega)$ stays in $\mathcal{Q}_{\lambda}$ for all $k$ , i.e., when $\omega\in\bar{\Omega}$ .

We first prove i) by showing that the sample paths staying the $\mathcal{Q}_{\lambda}$ converge to $\mathcal{D}_{1}$ with probability one, i.e., $\Pr[x_{k}\to\mathcal{D}_{1}|\bar{\Omega}]=1$ . Towards this end, define a new function $\tilde{\varphi}(x)$ such that $\tilde{\varphi}(x)=\varphi(x)$ for $x\in\mathcal{Q}_{\lambda}$ , and $\tilde{\varphi}(x)=0$ for $x\notin\mathcal{Q}_{\lambda}$ . Define another random process $\{\tilde{z}_{k}\}$ . If $J$ exists, when $J>T$ let

[TABLE]

where $\epsilon$ satisfies $V(\epsilon)=\tilde{\lambda}>\lambda$ ; when $J\leq T$ , let $\tilde{z}_{k}=\epsilon$ for any $k\in\mathbb{N}_{0}$ . If $J$ does not exist, we let $\tilde{z}_{k}=x_{k}$ for all $k\in\mathbb{N}_{0}$ . Then it is immediately clear that $\mathbb{E}\left[V\left(\tilde{z}_{k+T}\right)|\mathcal{F}_{k}\right]-V\left(\tilde{z}_{k}\right)\leq-\tilde{\varphi}(\tilde{z}_{k})\leq 0$ . By taking the expectation on both sides of this inequality, we obtain

[TABLE]

For any $k\in\mathbb{N}$ , there is a pair $p,q\in\mathbb{N}_{0}$ such that $k=pT+q$ . It follows from (5) that

[TABLE]

By summing up all the left and right sides of these inequalities respectively for all the $i,j$ and $m$ , we have

[TABLE]

As $V(x)$ is nonnegative for all $x$ , from (5) it is easy to observe that the left side of (6) is greater than $-\infty$ even when $k\to\infty$ since $T$ and $q$ are finite numbers, which implies that $\sum_{i=0}^{\infty}\mathbb{E}\tilde{\varphi}\big{(}\tilde{z}_{k}\big{)}<\infty$ . By Lemma 4, ones knows that $\tilde{\varphi}\big{(}\tilde{z}_{k}\big{)}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0$ as $k\to\infty$ . For $\omega\in\bar{\Omega}$ , one can observe that $\tilde{\varphi}(x_{k}(\omega))=\varphi(x_{k}(\omega))$ and $\tilde{z}_{k}\left(\omega\right)=x_{k}(\omega)$ according to the definitions of $\tilde{\varphi}$ and $\{\tilde{z}_{k}\}$ , respectively. Therefore, $\tilde{\varphi}(\tilde{z}_{k}(\omega))=\varphi(x_{k}(\omega))$ for all $\omega\in\bar{\Omega}$ , and subsequently

[TABLE]

From the continuity of $\varphi(x)$ it can be seen that $\Pr[x_{k}\to\mathcal{D}_{1}|\bar{\Omega}]=1$ . The proof of i) is complete since (4) means that the sample paths stay in $\mathcal{Q}_{\lambda}$ with probability at least $1-V(x_{0})/\lambda$ .

Next, we prove ii) in two steps. We first prove that the origin $x=0$ is stable in probability. The inequalities $h_{1}\left(\|s\|\right)\leq V(s)\leq h_{2}\left(\|s\|\right)$ imply that $V(x)=0$ if and only if $x=0$ . Moreover, it follows from $h_{1}\left(\|s\|\right)\leq V(s)$ and the inequality (4) that for any initial condition $x_{0}\in\mathcal{Q}_{\lambda}$ ,

[TABLE]

for any $\lambda_{1}>0$ . Since $h_{1}$ is a class $\mathcal{K}$ function and thus invertible, it can be observed that $\Pr\left[{\mathop{\sup}_{k\in\mathbb{N}}{{\|x_{k}\|}\geq h_{1}^{-1}(\lambda)}}\right]\leq V(x_{0})/\lambda\leq h_{2}(\|x_{0}\|)/\lambda$ . Then for any $\varepsilon>0$ , there holds that $\lim_{x_{0}\to 0}\Pr\left[{\mathop{\sup}_{k\in\mathbb{N}}{{\|x_{k}\|}>\varepsilon}}\right]\leq\Pr\left[{\mathop{\sup}_{k\in\mathbb{N}}{{\|x_{k}\|}\geq\varepsilon}}\right]=0$ , which means that the origin is stable in probability.

Second, we show the probability that $x_{k}\to 0$ tends to $1$ as $x_{0}\to 0$ . One knows that $\mathcal{D}_{1}=\{0\}$ since $\varphi$ is positive definite in $\mathcal{Q}_{\lambda}$ . From i) one knows that $x_{k}$ converges to $x=0$ with probability at least $1-V(x_{0})/\lambda$ . Since $V(x)\to 0$ as $x_{0}\to 0$ , there holds that $\lim\nolimits_{x_{0}\to 0}\Pr\left[\lim\nolimits_{k\to\infty}\|x_{k}\|=0\right]\to 1$ . The proof is complete. ∎

Particularly, if $\mathcal{Q}_{\lambda}$ is positively invariant, i.e., starting from $x_{0}\in\mathcal{Q}_{\lambda}$ all sample paths $x_{k}$ will stay in $\mathcal{Q}_{\lambda}$ for all $k\geq 0$ , this corollary follows from Theorem 1 straightforwardly.

Corollary 1.

If $\mathcal{Q}_{\lambda}$ is positively invariant w.r.t the system (1) and the assumptions a) and b) in Theorem 1 are satisfied, then the following statements apply:

i)* for any initial condition $x_{0}\in\mathcal{Q}_{\lambda}$ , $x_{k}$ converges to $\mathcal{D}_{1}$ with probability one;*

ii)* if moreover $\varphi(x)$ is positive definite on $\mathcal{Q}_{\lambda}$ , and $h_{1}\left(\|s\|\right)\leq V(s)\leq h_{2}\left(\|s\|\right)$ for two class $\mathcal{K}$ functions $h_{1}$ and $h_{2}$ , then $x=0$ is locally a.s. asymptotically stable in $\mathcal{Q}_{\lambda}$ . Furthermore, if $\mathcal{Q}_{\lambda}=\mathbb{R}^{n}$ , then $x=0$ is globally a.s. asymptotically stable.*

The next theorem provides a new criterion for exponential convergence and stability of stochastic systems, relaxing the conditions required by Lemma 2.

Theorem 2.

Suppose the assumptions a) and b) of Theorem 1 are satisfied with the inequality of b) strengthened to

[TABLE]

Then the following statements apply:

i)* for any given $x_{0}\in\mathcal{Q}_{\lambda}$ , $V(x_{k})$ converges to [math] exponentially at a rate no slower than $(1-\alpha)^{{1}/{T}}$ , and $x_{k}$ converges to $\mathcal{D}_{2}:=\{x\in\mathcal{Q}_{\lambda}:V(x)=0\}$ , with probability at least $1-V(x_{0})/\lambda$ ;*

ii)* if moreover $V$ satisfies that $c_{1}\|x\|^{a}\leq V(x)\leq c_{2}\|x\|^{a}$ for some $c_{1},c_{2},a>0$ , then $x=0$ is exponentially stable in probability.*

Proof.

We first prove i). From the proof of Theorem 1, we know that the sample paths $x_{k}$ stay in $\mathcal{Q}_{\lambda}$ with probability at least $1-V(x_{0})/\lambda$ for any initial condition $x_{0}\in\mathcal{Q}_{\lambda}$ if the assumption a) is satisfied. We next show that for any sample path that always stays in $\mathcal{Q}_{\lambda}$ , $V(x_{k})$ converges to [math] exponentially fast. Towards this end, we define a random process $\{\hat{z}_{k}\}$ . Let $J$ be as defined in the proof of Theorem 1. If $J$ exists, when $J>T$ , let

[TABLE]

where $\varepsilon$ satisfies $V(\varepsilon)=0$ , when $J\leq T$ , let $\hat{z}_{k}=\varepsilon$ for any $k\in\mathbb{N}_{0}$ ; if $J$ does not exist, we let $\hat{z}_{k}=x_{k}$ for all $k\in\mathbb{N}_{0}$ .

If the inequality (7) is satisfied, one has $\mathbb{E}\left[V\left(\hat{z}_{k+T}\right)|\mathcal{F}_{k}\right]-V\left(\hat{z}_{k}\right)\leq-\alpha V(\hat{z}_{k})$ . Using this inequality, we next show that $V\left(\hat{z}_{k+T}\right)$ converges to [math] exponentially. To this end, define a subsequence $Y^{(r)}_{m}:=V(\hat{z}_{mT+r}),m\in\mathbb{N}_{0}$ , for each $0\leq r\leq T-1$ . Let $\mathcal{G}_{m}^{(r)}:=\sigma(Y^{(r)}_{0},Y^{(r)}_{1},\dots,Y^{(r)}_{m})$ , and one knows that $\mathcal{G}_{m}^{(r)}$ is determined if we know $\mathcal{F}_{mT+r}$ . It then follows from the inequality (7) that for any $r$ , $\mathbb{E}[Y_{m+1}^{r}|\mathcal{G}_{m}^{(r)}]-Y_{m}^{(r)}\leq-\alpha Y_{m}^{(r)}$ . We observe from this inequality that

[TABLE]

This means that $(1-\alpha)^{-m}Y_{m}$ is a supermartingale, and thus there is a finite random number $\bar{Y}^{(r)}$ such that $(1-\alpha)^{-m}Y_{m}^{r}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\bar{Y}^{(r)}$ for any $r$ . Let $\gamma=\sqrt[T]{{1/(1-\alpha)}}$ , and then by definition of $Y^{(r)}_{m}$ we have $\gamma^{mT}V(\hat{z}_{mT+r})\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\bar{Y}^{(r)}$ . Straightforwardly, $\gamma^{mT+r}V(\hat{z}_{mT+r})\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\gamma^{r}\bar{Y}^{(r)}$ . Let $k=mT+r,\bar{Y}=\max_{r}\{\gamma^{r}\bar{Y}^{(r)}\}$ , then it almost surely holds that $\lim_{k\to\infty}\gamma^{k}V(\hat{z}_{k})\leq\bar{Y}$ . From Definition 1, one concludes that $V(\hat{z}_{k})$ almost surely converges to [math] exponentially no slower than $\gamma^{-1}=(1-\alpha)^{1/T}$ . From the definition of $\hat{z}_{k}$ , we know that $V(\hat{z}_{k}(\omega))=V(x_{k}(\omega))$ for all $\omega\in\bar{\Omega}$ , with $\bar{\Omega}$ defined in the proof of Theorem 1. Consequently, it holds that

[TABLE]

The proof of i) is complete since the sample paths stay in $\mathcal{Q}_{\lambda}$ with probability at least $1-V(x_{0})/\lambda$ .

Next, we prove ii). If the inequalities $c_{1}\|x\|^{a}\leq V(x)\leq c_{2}\|x\|^{a}$ are satisfied, and then we know that $V(x)=0$ if and only if $x=0$ . Moreover, it follows from (II) that for all the sample paths that stay in $\mathcal{Q}_{\lambda}$ there holds that $c_{1}\gamma^{k}\|x\|^{a}\leq\gamma^{k}V(x_{k})\leq\bar{Y}$ since $c_{1}\|x_{k}\|^{a}\leq V(x)$ . Hence, $\|x_{k}(\omega)\|\leq\left({\bar{V}}/{c_{1}}\right)^{1/a}\gamma^{-k/a}$ for any $\omega\in\bar{\Omega}$ , and one can check that this inequality holds with probability at least $1-V(x_{0})/\lambda$ . If $x_{0}\to 0$ , we know that $1-V(x_{0})/\lambda\to 1$ , which completes the proof. ∎

If $\mathcal{Q}_{\lambda}$ is positively invariant, the following corollary follows straightforwardly.

Corollary 2.

If $\mathcal{Q}_{\lambda}$ is positively invariant w.r.t the system (1) and suppose the assumptions a) and b) of Theorem 1 are satisfied with the inequality of b) strengthened to (7), the following statements apply:

i)* for any given $x_{0}\in\mathcal{Q}_{\lambda}$ , $V(x_{k})$ converges to [math] exponentially no slower than $(1-\alpha)^{{1}/{T}}$ with probability one;*

ii)* if moreover $V$ satisfies that $c_{1}\|x\|^{a}\leq V(x)\leq c_{2}\|x\|^{a}$ for some $c_{1},c_{2},a>0$ , then $x=0$ is locally a.s. exponentially stable in $\mathcal{Q}_{\lambda}$ . Furthermore, if $\mathcal{Q}_{\lambda}=\mathbb{R}^{n}$ , then $x=0$ is globally a.s. exponentially stable.*

The following corollary, which can be proven following the same lines as Theorems 1 and 2, shares some similarities to LaSalle’s theorem for deterministic systems. It is worth mentioning that the function $V$ here does not have to be radially unbounded.

Corollary 3.

Let $\mathbb{D}\subset\mathbb{R}^{n}$ be a compact set that is positively invariant w.r.t the system (1). Let $V:\mathbb{R}^{n}\to\mathbb{R}$ be a continuous nonnegative function, and $\bar{\mathcal{Q}}_{\lambda}:=\{x\in\mathbb{D}:V(x)<\lambda\}$ for some $\lambda>0$ . Assume that $\mathbb{E}\left[V\left(x_{k+1}\right)|\mathcal{F}_{k}\right]-V\left(x_{k}\right)\leq 0$ for all $k$ such that $x_{k}\in\bar{\mathcal{Q}}_{\lambda}$ , then

i)* if there is an integer $T\geq 1$ , independent of $\omega$ , such that for any $k\in\mathbb{N}_{0}$ , $\mathbb{E}\left[V\left(x_{k+T}\right)|\mathcal{F}_{k}\right]-V\left(x_{k}\right)\leq-\varphi(x_{k})$ , where $\varphi:\mathbb{R}^{n}\to\mathbb{R}$ is continuous and satisfies $\varphi(x)\geq 0$ for any $x\in\bar{\mathcal{Q}}_{\lambda}$ , then for any initial condition $x_{0}\in\bar{\mathcal{Q}}_{\lambda}$ , $x_{k}$ converges to $\bar{\mathcal{D}}_{1}:=\{x\in\bar{\mathcal{Q}}_{\lambda}:\varphi(x)=0\}$ with probability at least $1-V(x_{0})/\lambda$ ;*

ii)* if the inequality in a) is strengthened to $\mathbb{E}\left[V\left(x_{k+T}\right)|\mathcal{F}_{k}\right]$ $-V\left(x_{k}\right)\leq-\alpha V(x_{k})$ for some $0<\alpha<1$ , then for any given $x_{0}\in\bar{\mathcal{Q}}_{\lambda}$ , $V(x_{k})$ converges to [math] exponentially at a rate no slower than $(1-\alpha)^{{1}/{T}}$ , and $x_{k}$ converges to $\bar{\mathcal{D}}_{2}:=\{x\in\bar{\mathcal{Q}}_{\lambda}:V(x)=0\}$ , with probability at least $1-V(x_{0})/\lambda$ ;*

iii)* if $\bar{\mathcal{Q}}_{\lambda}$ is positively invariant w.r.t the system (1), then all the convergence in both i) and ii) takes place almost surely.*

Example 1 Cont. Now let us look back at Example 1 and still choose $V(x)=\left\|x\right\|_{\infty}$ as a stochastic Lyapunov function candidate. It is easy to see that $V(x)$ is a nonnegative supermartingale. To show the stochastic convergence, let $T=2$ and one can calculate the conditional expectations

[TABLE]

When $y_{k}=2,3$ , there analogously hold that

[TABLE]

From these three inequalities one can observe that starting from any initial condition $x_{0}$ , $\mathbb{E}V(x)$ decreases at an exponential speed after every two steps before it reaches [math]. By Corollary 2, one knows that origin is globally a.s. exponentially stable, consistent with our conjecture. $\Box$

Remark 1.

Kushner and other researchers have used more restricted conditions to construct Lyapunov functions than those appearing in our results to analyze asymptotic or exponential stability of random processes [2, 3, 4]. It is required that $\mathbb{E}[V(x_{k})]$ decreases strictly at every step, until $V(x_{k})$ reaches a limit value. However, in our result, this requirement is relaxed. In addition, Kushner’s results rely on the assumption that the underlying random process is Markovian, but we work with more general random processes.

In the following sections, we will show how the new Lyapunov criteria can be applied to distributed computation.

III Products of Random Sequences of Stochastic Matrices

In this section, we study the convergence of products of stochastic matrices, where the obtained results on finite-step Lyapunov functions are used for analysis. Let $\Omega_{0}:=\{1,2,\dots,m\}$ be the state space and $\mathcal{M}:=\{F_{1},F_{2},\dots,F_{m}\}$ be the set of $m$ stochastic matrices $F_{i}\in\mathbb{R}^{n\times n}$ . Consider a random sequence $\{W_{\omega}(k):k\in\mathbb{N}\}$ on the probability space $(\Omega,\mathcal{F},\Pr)$ , where $\Omega$ is the collection of all infinite sequences $\omega=(\omega_{1},\omega_{2},\dots)$ with $\omega_{k}\in\Omega_{0}$ , and we define $W_{\omega}(k):=F_{\omega_{k}}$ . For notational simplicity, we denote $W_{\omega}(k)$ by $W(k)$ . For the backward product of stochastic matrices

[TABLE]

where $k\in\mathbb{N},t\in\mathbb{N}_{0}$ , we are interested in establishing conditions on $\{W(k)\}$ , under which there holds that $\lim_{k\to\infty}W(k,0)=L$ for a random matrix $L=\mathbf{1}\xi^{\top}$ where $\xi\in\mathbb{R}^{n}$ satisfies $\xi^{\top}\mathbf{1}=1$ .

Before proceeding, let us introduce some concepts in probability. Let $\mathcal{F}_{k}=\sigma(W(1),\dots,W(k))$ , so that evidently $\{\mathcal{F}_{k}\}$ , $k=1,2,\dots,$ is an increasing sequence of $\sigma$ -fields. Let $\phi:\Omega\to\Omega$ be the shift operator, i.e., $\phi(\omega_{1},\omega_{2},\dots)=(\omega_{2},\omega_{3},\dots)$ . A random sequence of stochastic matrices $\{W(1),W(2),\dots,W(k),\dots\}$ is said to be stationary if the shift operator is measure-preserving. In other words, the sequences $\{W({k_{1}}),W({k_{2}}),\dots,W(k_{r})\}$ and $\{W({k_{1}+\tau}),W({k_{2}+\tau}),\dots,W({k_{r}+\tau})\}$ have the same joint distribution for all $k_{1},k_{2},\dots,k_{r}$ and $\tau\in\mathbb{N}$ . Moreover, a sequence is said to be stationary ergodic if it is stationary, and every invariant set $\mathcal{B}$ is trivial, i.e., for every $A\in\mathcal{B}$ , $\Pr[A]\in\{0,1\}$ . Here by a invariant set $\mathcal{B}$ , we mean $\phi^{-1}\mathcal{B}=\mathcal{B}$ .

III-A Convergence Results

We first introduce three classes of stochastic matrices, denoted by $\mathcal{M}_{1},\mathcal{M}_{2}$ , and $\mathcal{M}_{3}$ , respectively. We say $A\in\mathcal{M}_{1}$ if $A$ is indecomposable, and aperiodic (such stochastic matrices are also referred to as SIA for short); $A\in\mathcal{M}_{2}$ if $A$ is scrambling, i.e., no two rows of $A$ are orthogonal; and $A\in\mathcal{M}_{3}$ if $A$ is Markov, i.e., there exists a column of $A$ such that all entries in this column are positive [37, Ch. 4].

Coefficients of ergodicity serve as a fundamental tool in analyzing the convergence of products of stochastic matrices. In this paper, we employ a standard one. For a stochastic matrix $A\in\mathbb{R}^{n\times n}$ , the coefficient of ergodicity $\tau(A)$ is defined by

[TABLE]

It is known that this coefficient of ergodicity satisfies $0\leq\tau(A)\leq 1$ , and $\tau(A)$ is proper since $\tau(A)=0$ if and only if all the rows of $A$ are identical. Importantly, it holds that

[TABLE]

if and only if $A\in\mathcal{M}_{2}$ (see [37, p.82]). For any two stochastic matrices $A,B$ , the following property will be critical for the proof in Appendix A:

[TABLE]

To proceed, we make the following assumption for the sequence $\{W(k)\}$ .

Assumption 1.

Suppose the sequence of stochastic matrices $\{W(k)\}$ is driven by a random process satisfying the following conditions.

a)

There exists an integer $h>0$ such that

[TABLE]

holds for any $k\in\mathbb{N}_{0}$ , and

[TABLE] 2. b)

There is a positive number $\alpha$ such that $W_{ij}(k)\geq\alpha$ whenever $W_{ij}(k)>0$ .

Now we are ready to provide our main result on the convergence of stochastic matrices’ products.

Theorem 3.

Under Assumption 1, the product of the random sequence of stochastic matrices $W(k,0)$ converges to a random matrix $L=\mathbf{1}\xi^{\top}$ almost surely as $k\to\infty$ .

To prove Theorem 3, consider the stochastic discrete-time dynamical system described by

[TABLE]

for all $k\in\mathbb{N}_{0}$ , where $x_{k}\in\mathbb{R}^{n}$ , the initial state $x_{0}$ is a constant with probability one, ${y(k)}$ is regarded as randomly switching signal, and $\{W(1),W(2),\dots\}$ is the random process of stochastic matrices we are interested in. One knows that $x_{k}$ is adapted to $\mathcal{F}_{k}$ . Thus, to investigate the limiting behavior of the product (9), it is sufficient to study the limiting behavior of system dynamics (15). We say the state of system (15) reaches an agreement state if $\lim_{k\to\infty}x_{k}=\mathbf{1}\xi$ for some $\xi\in\mathbb{R}$ . Then the agreement of system (15) for any initial state $x_{0}$ implies that $W(k,0)$ converges to a rank-one matrix as $k\to\infty$ [26].

To investigate the agreement problem, we define $\left\lceil{x_{k}}\right\rceil:=\max_{1\leq i\leq n}x^{i}_{k},\left\lfloor{x_{k}}\right\rfloor:=\min_{1\leq i\leq n}x^{i}_{k}$ , and

[TABLE]

For any $k\in\mathbb{N}$ , $v_{k}$ is adapted to $\mathcal{F}_{k}$ since $x_{k}$ is. The agreement is said to be reached asymptotically almost surely if $v_{k}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0$ as $k\to\infty$ , and it is said to be reached exponentially almost surely with convergence rate no slower than $\gamma^{-1}$ if there exists $\gamma>1$ such that $\gamma^{k}v_{k}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}y$ for some finite $y\geq 0$ . The random variable $v_{k}$ has some important properties given by the following proposition.

Proposition 1.

Let $x_{k+1}=Ax_{k}$ , where $A$ is a stochastic matrix. Then $v_{k+1}\leq v_{k}$ , and $v_{k+1}<v_{k}$ for any $x_{k}\notin{\rm span}(\mathbf{1})$ if and only if $A$ is scrambling (i.e., $A\in\mathcal{M}_{2})$ .

Proof.

It is shown in [37] that $v_{k+1}\leq\tau{(A)}v_{k}$ with $\tau(\cdot)$ defined in (10). Therefore, the sufficiency follows from (11) straightforwardly. We then prove the necessity by contradiction. Suppose $A$ is not scrambling, and then there must exist at least two rows, denoted by $i,j$ , that are orthogonal. Define the two sets $\mathbf{i}:=\{l:a_{il}>0,l\in\mathbf{N}\}$ and $\mathbf{j}:=\{m:a_{jm}>0,m\in\mathbf{N}\}$ , respectively. It follows then from the scrambling property that $\mathbf{i}\cap\mathbf{j}=\emptyset$ . Let $x^{q}_{k}=1$ for all $q\in\mathbf{i}$ , $x^{q}_{k}=0$ for all $q\in\mathbf{j}$ , and let $x^{m}_{k}$ be any arbitrary positive number less than 1 for all $m\in\mathbf{N}\backslash(\mathbf{i}\cup\mathbf{j})$ if $\mathbf{N}\backslash(\mathbf{i}\cup\mathbf{j})$ is not empty. Then the states at time $k+1$ become

[TABLE]

and $0\leq x^{m}_{k+1}\leq 1$ for all $m\in\mathbf{N}\backslash(i\cup j)$ . This results in $v_{k+1}=v_{k}=1$ . By contradiction one knows that a scrambling $A$ is necessary for $v_{k+1}<v_{k}$ , which completes the proof. ∎

In order to prove Theorem 3, the following intermediate result is useful.

Proposition 2.

For any scrambling matrix $A\in\mathbb{R}^{n\times n}$ , the coefficient of ergodicity $\tau(A)$ defined in (10) satisfies

[TABLE]

if all the positive elements of $A$ are lower bounded by $\gamma>0$ .

Proof:

Consider any two rows of $A$ , denoted by $i,j$ . Define two sets, $\mathbf{i}:=\{s:a_{is}>0\}$ and $\mathbf{j}:=\{s:a_{js}>0\}$ . From the scrambling hypothesis, one knows that $\mathbf{i}\cap\mathbf{j}\neq\emptyset$ . Thus it holds that

[TABLE]

Then from the definition of $\tau(A)$ , it is easy to see

[TABLE]

which completes the proof. ∎

We are in the position to prove Theorem 3 by showing that $v_{k}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0$ as $k\to\infty$ , where Theorem 1 and Corollary 1 will be used.

Proof:

Let $V(x_{k})=v_{k}$ be a finite-step stochastic Lyapunov function candidate for the system dynamics (15). It is easy to see $V(x)=0$ if and only if $x\in{\rm span}(\mathbf{1})$ . Since all $W(k)$ are stochastic matrices, we observe that $\mathbb{E}[V(x_{k+1})|{\mathcal{F}_{k}}]-V(x_{k})\leq 0$ from Proposition 1, which implies that $V(x_{k})$ is exactly a supermartingale with respect to $\mathcal{F}_{k}$ . From Lemma 3, we know $V(x_{k})\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}{\bar{V}}$ for some ${\bar{V}}$ because $V(x_{k})\geq 0$ and $\mathbb{E}V(x_{k})<\infty$ . From Assumption 1, we know that there is an $h$ such that the product $W(k+h,k)$ is scrambling with positive probability for any $k$ . Let $\mathcal{W}_{k}$ be the set of all possible $W(k+h,k)$ at time $k$ , and $n_{k}$ the cardinality of $\mathcal{W}_{k}$ . Let $n_{k}^{s}$ be the number of scrambling matrices in $\mathcal{W}_{k}$ . We denote each of these scrambling matrices and each of non-scrambling matrices by $S_{k}^{i},i=1,\dots,n_{k}^{s}$ and $\bar{S}_{k}^{j},j=1,\dots,n_{k}-n_{k}^{s}$ , respectively. The probabilities of all the possible $W(k+h,k)$ sum to 1, i.e.,

[TABLE]

Then the conditional expectation of $V(x)$ after finite steps for any $k$ becomes

[TABLE]

where $\tau(\cdot)$ is given by (10). One can calculate that

[TABLE]

where Proposition 1 and equation (17) have been used. From Assumption 1.b), we know that the positive elements of $W(k)$ are lower bounded by $\alpha$ , and thus the positive elements of $S_{k}^{i}$ in (18) are lower bounded by $\alpha^{h}$ . Thus $\tau(S_{k}^{i})\leq 1-\alpha^{h}$ according to Proposition 2, and it follows that

[TABLE]

By iterating, one can easily show that

[TABLE]

It then follows that $V\left({x_{0}}\right)-\mathbb{E}\left[{{V\left(x_{nh}\right)}}\right]<\infty$ even when $n\to\infty$ , since $V(x)\geq 0$ . According to the condition (14), we know $\sum_{k=0}^{n-1}\sum_{i=1}^{n_{k}^{s}}{\Pr\left[{S_{k}^{i}}\right]}=\infty$ . By contradiction, it is easy to infer that $\mathbb{E}V(x_{k})\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0$ . Since we have already shown that $V(x_{k})\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\bar{V}$ for some random $\bar{V}\geq 0$ , one can conclude that $V(x_{k})\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0$ . For any given $x_{0}\in\mathbb{R}^{n}$ , define the compact set $\mathcal{Q}:=\{x:\left\lceil{x}\right\rceil\leq\left\lceil{x_{0}}\right\rceil,\left\lfloor{x}\right\rfloor\geq\left\lfloor{x_{0}}\right\rfloor$ . For any random sequence $\{W(k)\}$ , it follows from the system dynamics (15) that

[TABLE]

and thus $x_{k}$ will remain within $\mathcal{Q}$ . From Corollary 3, we know that $x_{k}$ asymptotically converges to $\{x\in\mathcal{Q}:\varphi_{k}(x)=0\}$ , or equivalently, $\{x\in\mathcal{Q}:V(x)=0\}$ almost surely as $k\to\infty$ since $V(x)$ is continuous. In other words, for any $x_{0}\in\mathbb{R}^{n}$ , $x_{k}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}\zeta\mathbf{1}$ for some $\zeta\in\mathbb{R}$ , which proves Theorem 3. ∎

For a random sequence of stochastic matrices, Theorem 3 has provided a quite relaxed condition for the backward product (9) determined by the random sequence $\{W(k)\}$ to converge to a rank-one matrix: over any time interval of length $h$ , i.e., $[h+t,t]$ for any $t\geq 0$ , the product $W(t+h)\cdots W({t+1})$ has positive probability to be scrambling. The following corollary follows straightforwardly since any Markov matrix is certainly scrambling.

Corollary 4.

For a random sequence $\{W_{k}\}_{k=1}^{\infty}$ , the product (9) converges to a random matrix $L=\mathbf{1}\xi^{\top}$ almost surely if there exists an integer $h$ such that $W(t+h,t)$ becomes a Markov matrix for any $k$ with positive probability and $\sum\nolimits_{i=1}^{\infty}{\Pr\left[{W\left({k+ih,k+\left({i-1}\right)h}\right)}\in\mathcal{M}_{3}\right]}=\infty,\forall k$ .

Next we assume that the sequence $\{W(k)\}$ is driven by an underlying stationary process. Then the condition in Theorem 3 can be further relaxed.

Assumption 2.

Suppose the random sequence of stochastic matrices $\{W(k)\}$ is driven by a stationary process satisfying the following conditions.

a)

There exists an integer $h>0$ such that

[TABLE]

holds for any $k\in\mathbb{N}_{0}$ . 2. b)

There is a positive number $\alpha$ such that $W_{ij}(k)\geq\alpha$ whenever $W_{ij}(k)>0$ .

In other words, Assumption 2 suggests that any corresponding matrix product of length $h$ becomes an SIA matrix with positive probability, and the positive elements for all $W(k)$ are uniformly lower bounded away from some positive value.

Theorem 4.

Under Assumption 2, the product of the random sequence of stochastic matrices $W(k,0)$ converges to a random matrix $L=\mathbf{1}\xi^{\top}$ almost surely.

If two stochastic matrices $A_{1}$ and $A_{2}$ have zero elements in the same positions, we say these two matrices are of the same type, denoted by $A_{1}\sim A_{2}$ . Obviously, there holds the trivial case $A_{1}\sim A_{1}$ . One knows that for any SIA matrix $A$ , there exists an integer $l$ such that $A^{l}$ is scrambling; it is easy to extend this to the inhomogeneous case, i.e., any product of $l$ stochastic matrices of the same type of $A$ is scrambling if all the matrices are element-wise lower bounded.

Proof:

Since $\{W(k)\}$ is driven by a stationary process, we know that $\{W\left(t+h\right),\dots,W\left(t+1\right)\}$ has the same joint distribution as $\{W\left(t+2h\right),\dots,$ $W\left(t+h+1\right)\}$ for any $t\in\mathbb{N}_{0},h\in\mathbb{N}$ . For the $h$ given in Assumption 2, there exists an SIA matrix $A$ such that $\Pr[W\big{(}t+kh+h,t+kh+1\big{)}=A]>0$ . Thus it follows that $\Pr[W\big{(}t+kh+2h,t+kh+1\big{)}=A]>0$ for any $k\in\mathbb{N}_{0}$ . Thus

[TABLE]

When $W(t+h,t)\in\mathcal{M}_{1}$ , which happens with positive probability, we have

[TABLE]

By recursion one can conclude that all the $m$ products $W({t}+(k+1)h,{t}+kh),k\in\{0,\dots,m-1\}$ , occur as the same SIA type with positive probability. Since all the products $W({t}+(k+1)h,{t}+kh)$ are of the same type, one can choose $m$ such that $W(t+mh,t)$ is scrambling. This in turn implies that $\Pr\left[W(t+mh,t)\in\mathcal{M}_{2}\right]>0$ , and the property of stationary process makes sure that (14) holds. The conditions in Assumption 1 are therefore all satisfied, and then Theorem 4 follows from Theorem 3. ∎

Remark 2.

Theorems 3 and 4 have established some sufficient conditions for the convergence of a random sequence of stochastic matrices to a rank-one matrix. A further question is how these results can be applied to control distributed computation processes. To answer this question, let us consider a finite set of stochastic matrices $\mathcal{L}=\{F_{1}\dots,F_{m}\}$ , from which each $W(k)$ in the random sequence $\{W(k)\}$ is sampled. It is defined in [38] that $\mathcal{L}$ is a consensus set if the arbitrary product $\prod_{i=1}^{k}W(i),W(i)\in\mathcal{L}$ , converges to a rank-one matrix. However, it has also been shown that to decide whether $\mathcal{L}$ is a consensus set is an NP-hard problem [38, 39]. For a non-consensus set $\mathcal{L}$ , it is always not obvious how to find a deterministic sequence that converges, especially when $\mathcal{L}$ has a large number of elements and $F_{i}$ has zero diagonal entries. However, the convergence can be ensured almost surely by introducing some randomness in the sequence, provided that there is a convergent deterministic sequence intrinsically.

III-B Estimation of Convergence Rate

In Section III-A, we have shown how the product $W(k,0)$ determined by a random process asymptotically converges to a rank-one matrix $W$ a.s. as $k\to\infty$ . However, the convergence rate for such a randomized product is not yet clear. It is quite challenging to investigate how fast the process converges, especially when each $W(k)$ may have zero diagonal entries. In this subsection, we address this problem by employing finite-step stochastic Lyapunov functions. Now let us present the main result on the convergence rate.

Theorem 5.

In addition to Assumption 1, if there exist a number $p$ , $0<p<1$ , such that

[TABLE]

then the almost sure convergence of the product of $W(k,0)$ to a random matrix $L=\mathbf{1}\xi^{\top}$ is exponential, and the rate is no slower than $\left({1-p{\alpha^{h}}}\right)^{1/h}$ .

Proof:

Choosing $V\left(x_{k}\right)=v_{k}$ as a finite-step stochastic Lyapunov function candidate, from (18) we have

[TABLE]

Furthermore, it is easy to see that

[TABLE]

Substituting it into (III-B) yields

[TABLE]

It follows from Corollary 3 that ${V\left({x_{k+h}}\right)}\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0$ , with an convergence rate no slower than $\left({1-p{\alpha^{h}}}\right)^{1/h}$ . In other words, the agreement is reached exponentially almost surely, which implies Theorem 5. ∎

Theorem 5 has established the almost sure exponential convergence rate for the product of $\{W(k)\}$ . If any subsequence $\{W(k+1),\dots,W(k+2),W(k+h)\}$ can result in a scrambling product $W(k+h,k)$ with positive probability and this probability is lower bounded away by some positive number, and then the convergence rate is exponential. Interestingly, the greater this lower bound is, the faster the convergence becomes. If we consider a special random sequence which is driven by a stationary ergodic process, the exponential convergence rate follows without any other conditions apart from Assumption 2, and an alternative proof is given in Appendix A.

Corollary 5.

If the random process governing the evolution of the sequence $\{W(k)\}$ is stationary ergodic, the product $W(k,0)$ converges to a random rank-one matrix at an exponential rate almost surely if the conditions of Assumption 2 are satisfied.

III-C Connection to Markov Chains

In this subsection, we show that Theorems 4, and 5 are the generalizations of some well known results for Markov chains in [40, 37]. A fundamental result on inhomogeneous Markov chains is as follows.

Lemma 5 ([37, Th. 4.10], [40]).

If the product $W(k,t)$ , formed from a sequence $\{W(k)\}$ , satisfies $W(t+k,t)\in\mathcal{M}_{1}$ for any $k\geq 1,t\geq 0$ , and $W_{ij}(k)\geq\alpha$ whenever $W_{ij}(k)>0$ , then $W(k,0)$ converges to a rank-one matrix.

Let $h$ be the number of distinct types of scrambling matrices of order $n$ . It is known that the product $W(t+h,t)$ is scrambling for any $t$ . In this case, we may take the probability of each product $W(t+h,t)$ being scrambling as $p=1$ , and as an immediate consequence of Theorem 5, we know that $W(k,0)$ converges to a rank-one matrix at a exponential rate that is no slower than $(1-\alpha^{h})^{{1}/{h}}$ . This convergence rate is consistent with what is estimated in [37, Th. 4.10]. This also applies to the homogeneous case where $W(k)=W_{1}$ for any $k$ with $W_{1}$ being scrambling. Moreover, it is known that the condition can be relaxed by just requiring $W_{1}$ to be SIA to ensure the convergence, which is an immediate consequence of Theorem 4.

In next section, we discuss how the results can be further applied to the context of asynchronous computations.

IV Asynchronous Agreement over Possibly Periodic Networks

In this section, we take each component $x^{j}$ in $x$ from (15) as the state of agent $i$ in an $n$ -agent system. Define the distributed coordination algorithm

[TABLE]

where the averaging weights $w_{ij}\geq 0$ , $\sum_{j=1}^{n}w_{ij}=1$ , and $t_{k}$ denote the time instants when updating actions happen. Here we assume the initial state $x(t_{0})$ is given. It is always assumed that $T_{1}\leq t_{k+1}-t_{k}\leq T_{2}$ , where $t_{0}=0$ and $T_{1},T_{2}$ are positive numbers. We say the states of system (22) reach agreement if $\lim_{k\to\infty}x(t_{k})=\mathbf{1}\zeta$ , mentioned in Section III. Let $W=[w_{ij}]\in\mathbb{R}^{n\times n}$ , and obviously $W$ is a stochastic matrix. The algorithm (22) can be rewritten as $x(t_{k+1})=Wx(t_{k})$ . In fact, the matrix $W$ can be associated with a directed, weighted graph $\mathcal{G}_{W}=\left(\mathcal{V,E}\right)$ , where $\mathcal{V}:=\{1,2,\cdots,n\}$ is the vertex set and $\mathcal{E}$ is the edge set for which $(i,j)\in\mathcal{E}$ if $w_{ji}>0$ . The graph $\mathcal{G}_{W}$ is called a rooted one if there exists at least one vertex, called a root, from which any other vertex can be reached. It is known that agents are able to reach agreement for all $x(0)$ if $W$ is SIA ([40, 37]). However, the situations when $W$ is not SIA have not been studied before, although they appear often in real systems, such as social networks. As we are interested in studying the agreement problem when $W$ is possibly periodic, let us define periodic stochastic matrices.

Definition 4.

A stochastic matrix $A\in\mathbb{R}^{n\times n}$ is said to be periodic with period $d>1$ if $d$ is the common divisor of all the $t$ such that $A^{m+t}\sim A^{m}$ for a sufficiently large integer $m$ .

Definition 4 is a generalization of the definition of an irreducible periodic matrix [37, Def. 1.6]. In this definition, a periodic stochastic matrix is not necessarily irreducible. With a slight abuse of terminology, we say the graph $\mathcal{G}_{W}$ is periodic if the associated matrix $W$ is periodic.

In the context of distributed computation, it is always assumed that each individual computational unit in the network has access to its own latest state while implementing the iterative update rules [19, 21]. A class of situations that have received considerably less attention in the literature arise when some individuals are not able to obtain their own state, a case which can result from memory loss. Similar phenomena have also been observed in social networks while studying the evolution of opinions. Self-contemptuous people change their opinions solely in response to the opinions of others. The existence of computational units or individuals who are not able to access their own states sometimes might result in the computational failure or opinions’ disagreement. As such an example, a periodic matrix $W$ , which must has all zero diagonal entries (no access to their own states for all individuals), always leads the system (22) to oscillation. This is because for a periodic $W$ , $W^{k}$ never converges to a matrix with identical rows as $k\to\infty$ . Instead, the positions of $W^{k}$ that have positive values are periodically changing with $k$ , resulting in a periodically changing value of $W^{k}x(0)$ . This motivates us to investigate the particular case where $W$ is possibly periodic.

In this section, we show that agreement can be reached even when $W$ is periodic, just by introducing asynchronous updating events to the coupled agents. In fact, perfect synchrony is hard to realize in practice as it is difficult for all agents to have access to a common clock according to which they coordinate their updating actions, while asynchrony is more likely. Researchers have studied how agreement can be preserved with the existence of asynchrony, see e.g., [41, 42]. Unlike these works, we approach the same problem from a different aspect, where agreement occurs just because of asynchrony. A counterpart of this problem where $W$ is irreducible and periodic has been covered in our earlier work [43]. We consider a more general case in this section where $W$ can be reducible.

To proceed, we define a framework of randomly asynchronous updating events. It is usually legitimate to postulate that on occasions more than one, but not all, agents may update. Assume that each agent is equipped with a clock, which need not be synchronized with other clocks. The state of each agent remains unchanged except when an activation event is triggered by its own clock. Denote the set of event times of the $i$ th agent by $\mathcal{T}^{i}=\{0,t^{i}_{1},\cdots,t^{i}_{k},\cdots\},k\in\mathbb{N}$ . At the event times, agent $i$ updates its state obeying the asynchronous updating rule

[TABLE]

where $i\in\mathbf{N}$ . We assume that the clocks which determine the updating events for the agents are driven by an underlying random process. The following assumption is important for the analysis.

Assumption 3.

For any agent $i$ , the intervals between two event times, denoted by $h^{i}_{k}=t^{i}_{k}-t^{i}_{k-1}$ , are such that

(i)

$h^{i}_{k}$ * are upper bounded with probability 1 for all $k$ and all $i$ ;* 2. (ii)

$\{h^{i}_{k}:k\in\mathbb{N}_{0}\}$ * is a random sequence, with $\{h^{1}_{k}\}$ , $\{h^{2}_{k}\}$ , $\dots$ , $\{h^{n}_{k}\}$ being mutually independent.*

Assumption 3 ensures that an agent can be activated again within finite time after it is activated at $t^{i}_{k-1}$ for all $k\in\mathbb{N}$ , which implies that all agents will update their states for infinitely many times in the long run. In fact, Assumption 3 can be satisfied if the agents are activated by mutually independent Poisson clocks or at rates determined by mutually independent Bernoulli processes ([44, Ch. 6], [32, Ch. 2]).

Let $\mathcal{T}=\{t_{0},t_{1},t_{2},\cdots,t_{k},\cdots\}$ denote all event times of all the $n$ agents, in which the event times have been relabeled in a way such that $t_{0}=0$ and $t_{\tau}<t_{\tau+1},\tau=\{0,1,2,\cdots\}$ . This idea has been used in [45] and [21] to study asynchronous iterative algorithms. One situation may occur in which there exist some $k$ such that $t_{k}\in\mathcal{T}^{i}$ and $t_{k}\in\mathcal{T}^{j}$ for some $i,j$ , which implies more than one agent is activated at some event times. Although this is not likely to happen when the underlying process is some special random ones like Poisson, our analysis and results will not be affected. For simplicity, we rewrite the set of event times as $\mathcal{T}=\{0,1,2,\cdots,k,\cdots\}$ . Then the system with asynchronous updating can be treated as one with discrete-time dynamics in which the agents are permitted to update only at certain event times $k,k\in\mathbb{N}$ , according to the updating rule (23) at each time $k$ . Since each $k\in\mathcal{T}$ can be the event time of any subset of agents, we can associate any set of event times $\{k+1,k+2,\dots,k+h\}$ with the updating sequence of agents $\{\lambda(k+1),\lambda(k+2),\dots,\lambda(k+h)\}$ with $\lambda(i)\in\mathcal{V}$ . Under Assumption 3, one knows that this updating sequence can be arbitrarily ordered, and each possible sequence can occur with positive probability, though the particular value is not of concern.

Assume at time $k$ , $m\geq 1$ agents are activated, labeled by $k_{1},k_{2},\dots,$ $k_{m}$ , then we define the following matrices

[TABLE]

where $u_{i}\in\mathbb{R}^{n}$ is the $i$ th column of the identity matrix $I_{n}$ and $w_{k}\in\mathbb{R}^{n}$ denotes the $k$ th row of $W$ . We call $W(k)$ the asynchronous updating matrix at time $k$ . Then the asynchronous updating rule (23) becomes

[TABLE]

where $\{W(k)\}$ is a random sequence of asynchronous updating matrices which are stochastic, and $x_{0}\in\mathbb{R}^{n}$ is a given initial state. We say the asynchronous agreement is reached if $x_{k}$ converges to a scaled all-one vector when the agents update asynchronously. It suffices to study the convergence of the product $W(k)\dots W(2)W(1)$ to a rank-one matrix. We now show the asynchronous agreement is reached almost surely even when the graph is periodic. A necessary and sufficient condition for the graph is obtained, under which the agreement can always be reached.

Theorem 6.

If the agents coupled by a network update asynchronously under Assumption 3, they reach agreement almost surely if and only if the network is rooted, i.e., the matrix $W$ is indecomposable.

To prove this theorem, we need to introduce some additional concepts and results. It is equivalent to say the associated graph $\mathcal{G}_{W}$ is rooted if $W$ is indecomposable. Denote the set of all the roots of $\mathcal{G}_{W}$ by $\mathbf{r}\subseteq\mathcal{V}$ . We can partition the vertices of $\mathcal{G}_{W}$ into some hierarchical subsets as follows. For any $\kappa\in\mathbf{r}$ , there must exist at least one directed spanning tree rooted at $\kappa$ , see e.g., Fig. 1 (a). We select any of these directed spanning trees, denoted by $\mathcal{G}^{s}_{W}$ . There exists a directed path from $\kappa$ to any other vertex $i\in\mathcal{V}\backslash\kappa$ , see e.g., Fig. 1 (b). Let $l_{i}$ be the length of the directed path from $\kappa$ to $i$ , and there exists an integer $L\leq n$ such that $l_{i}<L$ for all $i$ . Define

[TABLE]

and $\mathcal{H}_{0}=\{\kappa\}$ . From this definition, one can partition the vertices of $\mathcal{G}^{s}_{W}$ into $L$ hierarchical subsets, i.e., $\mathcal{H}_{0},\mathcal{H}_{1},\cdots,\mathcal{H}_{L-1}$ , according to the vertices’ distances to the root $\kappa$ . Let $n_{r}$ be the number of vertices in the subset $\mathcal{H}_{r}$ , $0\leq r\leq L-1$ (see the example in Fig. 1 (b)). Note that given a spanning tree, its corresponding hierarchical subsets $H_{r}$ ’s are uniquely determined.

Definition 5.

An updating vertex sequence of length $n$ is said to be hierarchical if it can be partitioned into some successive subsequences, denoted by $\{\mathcal{A}_{0},\dots,\mathcal{A}_{L-1}\}$ with $\mathcal{A}_{r}=\{\lambda_{r}(1),\lambda_{r}(2),\cdots,\lambda_{r}(n_{r})\}$ , such that $\bigcup\nolimits_{k=1}^{{n_{r}}}{{\lambda_{r}}\left(k\right)={\mathcal{H}_{r}}}$ for all $r=0,\cdots,L-1$ , where $\mathcal{H}_{r}$ ’s are the hierarchical subsets of some spanning tree $\mathcal{G}^{s}_{W}$ in $\mathcal{G}_{W}$ .

Proposition 3.

If agents coupled by $\mathcal{G}_{W}$ update in a hierarchical sequence $\{a_{1},\cdots,a_{n}\},a_{i}\in\mathcal{V}$ for all $i$ , the product of the corresponding asynchronous updating matrices, $\tilde{W}:={W_{{a_{n}}}}\cdots{W_{{a_{2}}}}{W_{{a_{1}}}}$ , is a Markov matrix.

To prove this proposition, we define an operator $\mathcal{N}(\cdot,\cdot)$ for any stochastic matrix and any subset $\mathcal{S}\in\mathcal{V}$

[TABLE]

and we write $\mathcal{N}(A,\{i\})$ as $\mathcal{N}(A,i)$ for brevity. It is easy to check then for any two stochastic matrices $A_{1},A_{2}\in\mathbb{R}^{n\times n}$ and for any subset $\mathcal{S}\in\mathcal{V}$ , it holds that

[TABLE]

Proof:

It suffices to show that all $i\in\mathcal{V}$ share at least one common neighbor in the graph $\mathcal{G}_{\tilde{W}}$ , i.e.,

[TABLE]

We rewrite the product of asynchronous updating matrices into

[TABLE]

For any distinct $i,j\in\mathcal{V}$ , we know that $\mathcal{N}(W_{j},i)=\{i\}$ from the definition of asynchronous updating matrices. Then for any $\lambda_{r}(t)\in\mathcal{H}_{r},t\in\{1,\cdots,n_{r}\},r\in\{1,\cdots,L-1\}$ , it holds that

[TABLE]

where the property (26) has been used. From Definition 5, one knows that there exists at least one vertex ${\lambda_{r-1}}\left(t_{1}\right)\in\mathcal{H}_{r-1}$ that can reach ${\lambda_{r}}\left(t\right)$ in $\mathcal{G}_{W}$ and subsequently in $\mathcal{G}_{W_{{\lambda_{r}}\left(t\right)}}$ , which implies

[TABLE]

It then follows

[TABLE]

Similarly, there hold that

[TABLE]

As a recursion, it must be true that

[TABLE]

where $\kappa$ is a root of $\mathcal{G}_{W}^{s}$ . In fact, it holds that $\lambda_{0}(1)=\kappa$ , and then we know

[TABLE]

Substituting (29) into (28) leads to

[TABLE]

for all ${\lambda_{r}}(t)$ . Since $\bigcup\nolimits_{r,t}{\left\{{{\lambda_{r}}\left(t\right)}\right\}}=\mathcal{V}$ , we know

[TABLE]

Straightforwardly, (27) follows, which completes the proof. ∎

Since the hierarchical sequences will appear with positive probability in any sequence of length $n$ , one can easily prove the following proposition by letting $l=n$ .

Proposition 4.

There exist an integer $l$ such that the product $W(k+l)\cdots W(k+1)$ , where $W(k)$ is given in (25), is a Markov matrix with positive probability for any $k\geq 0$ .

Proof:

We prove the necessity by contradiction. Suppose the matrix $W$ is decomposable. Then there are at least two sets of vertices that are isolated from each other. Then agreement will never happen between these two isolated groups if they have different initial states. Let $l=n$ , in view of Corollary 4, the sufficiency follows directly from Proposition 4, which completes the proof. ∎

Remark 3.

Note that the hierarchical sequence is a particular type of updating orders that results in a Markov matrix as the product of the corresponding updating matrices. We have identified another type of updating orders in our earlier work when $W$ is irreducible and periodic [43]. It is of great interest for future work to look for other updating mechanisms to enable the appearance of Markov matrices or scrambling matrix to guarantee asynchronous agreement.

In the next section, we look into another application in solving linear algebraic equations.

V To Solve Linear Algebraic Equations

Researchers have been quite interested in solving a system of linear algebraic equations in the form of $Ax=b$ in a distributed way [46, 47, 28, 29]. In this section we deal with the problem under the assumption that this system of equations has at least one solution. The set of equations is decomposed into smaller sets and distributed to a network of $n$ processors, referred to as agents, to be solved in parallel. Agents can receive information from their neighbors and the neighbor relationships are described by a time-varying $n$ -vertex directed graph $\mathcal{G}(t)$ with self-arcs. When each agent knows only the pair of real-valued matrices $(A_{i}^{n_{i}\times m},b_{i}^{n_{i}\times 1})$ , the problem of interest is to devise local algorithms such that all $n$ agents can iteratively compute the same solution to the linear equation $Ax=b$ , where $A=[A_{1}^{\top},A_{2}^{\top},\dots,A_{n}^{\top}]^{\top},b=[b^{\top}_{1},b^{\top}_{2},\dots,b^{\top}_{n}]^{\top}$ and $\sum_{i=1}^{n}n_{i}=m$ . A distributed algorithm to solve the problem is introduced in [30], where the iterative updating rule for each agent $i$ is described by

[TABLE]

where $x^{i}_{k}\in\mathbb{R}^{m}$ , $d^{i}_{k}$ is the number of neighbors of agent $i$ at time $k$ , ${\cal N}_{i}(k)$ is the collection of $i$ ’s neighbors, $P_{i}$ is the orthogonal projection on the kernel of $A_{i}$ , and the initial value $x^{i}_{1}$ is any solution to the equations of $A_{i}x=b_{i}$ .

The results in [30] have shown that all $x^{i}_{k}$ converge to the same solution exponentially fast if the sequence of graphs $\mathcal{G}(t)$ is repeatedly jointly strongly connected. This condition is restrictive since it is required that for some integer $l$ , the composition of the sequence of graphs, $\{\mathcal{G}(k),\dots,\mathcal{G}(k+l-1)\}$ , must be strongly connected for any $t$ . By the composition of a directed graph $\mathcal{G}_{1}$ with the vertex set $\mathcal{V}$ with another directed graph $\mathcal{G}_{2}$ with the same vertex set $\mathcal{V}$ , denoted by $\mathcal{G}_{2}\circ\mathcal{G}_{1}$ , we mean the directed graph with the vertex set $\mathcal{V}$ and edge set defined in such a way that $(i,j)$ is an arc of the composition just in case there is a vertex $i_{1}$ such that $(i,i_{1})$ is an edge in $\mathcal{G}_{1}$ and meanwhile $(i_{1},j)$ is an edge in $\mathcal{G}_{2}$ . It is not so easy to satisfy this condition if the network is changing randomly. Now assume that the evolution of the sequence of graphs $\{\mathcal{G}(1),\dots,\mathcal{G}(k),\dots\}$ is driven by a random process. In this case, results in Theorem 1 and Corollary 1 can be applied to relax the condition in [30] to achieve the following more general result.

Theorem 7.

Suppose each agent updates its state $x^{i}_{k}$ according to the rule (30). All states $x^{i}_{k}$ converge to the same solution to $Ax=b$ almost surely if the following two conditions are satisfied

a)

there exists an integer $l$ such that the composition of any sequence of randomly changing graphs $\{\mathcal{G}(k),\mathcal{G}(k+1),\dots,\mathcal{G}(k+l-1)\}$ is strongly connected with positive probability $p(k)>0$ for any $k\in\mathbb{N}$ ;

b)

there holds $\sum\nolimits_{i=0}^{\infty}{p\left({{k}+il}\right)}=\infty,\forall k.$

To prove the theorem, we define an error system. Let $x^{*}$ be any solution to $Ax=b$ , so $A_{i}x^{*}=b_{i}$ for any $i$ . Then, we define

[TABLE]

which, as is done in [30], can be simplified into

[TABLE]

Let $e_{k}=[{e^{1}_{k}}^{\top},\dots,{e^{n}_{k}}^{\top}]^{\top}$ , $A(k)$ be the adjacency matrix of the graph $\mathcal{G}(k)$ , $D(k)$ be the diagonal matrix whose $i$ th diagonal entry is $d^{i}_{k}$ , and $W(k)=D^{-1}(k)A^{\top}(k)$ . It is clear that $W(k)$ is a stochastic matrix, and $\{W(k)\}$ is a stochastic process. Now we write equation (31) into a compact form

[TABLE]

where $\otimes$ denotes the Kronecker product, $P:={\rm diag}\{P_{1},P_{2},$ $\dots$ , $P_{n}\}$ , and $\{W(k)\}$ is a random process. We will show this error system is globally a.s. asymptotically stable. Define the transition matrix of this error system by

[TABLE]

In order to study the stability of the error system (32), we define a mixed-matrix norm for an $n\times n$ block matrix $Q=[Q_{ij}]$ whose $ij$ th entry is a matrix $Q_{ij}\in\mathbb{R}^{m\times m}$ , and

[TABLE]

where ${\left\langle Q\right\rangle}$ is the matrix in $\mathbb{R}^{n\times n}$ whose $ij$ th entry is $|Q_{ij}|_{2}$ . Here $\|\cdot\|_{2}$ and $\|\cdot\|_{\infty}$ denote the induced 2 norm and infinity norm, respectively. It is easy to show that $\left[\kern-1.49994pt\left[\left.{\cdot}\right]\kern-1.49994pt\right]\right.\;$ is a norm. Since $\|Ax\|_{2}\leq\|A\|_{2}\|x\|_{2}$ for $x\in\mathbb{R}^{nm\times nm}$ , it follows straightforwardly that $\left[\kern-1.49994pt\left[\left.{Ax}\right]\kern-1.49994pt\right]\right.\;\leq\left[\kern-1.49994pt\left[\left.{A}\right]\kern-1.49994pt\right]\right.\;\left[\kern-1.49994pt\left[\left.{x}\right]\kern-1.49994pt\right]\right.\;$ . It has been proven in [30] that $\Phi(k+T,k)$ is non-expansive for any $k>0,T\geq 0$ . In other words, it holds that $\left[\kern-1.49994pt\left[\left.{\Phi(k+T,k)}\right]\kern-1.49994pt\right]\right.\;\leq 1$ . Moreover, the transition matrix is a contraction, i.e., $\left[\kern-1.49994pt\left[\left.{\Phi(k+T,k)}\right]\kern-1.49994pt\right]\right.\;<1$ , if there exists a “route” $j=i_{0},i_{1},\dots,i_{T}=i$ over the sequence $\{\mathcal{G}(k),\dots,\mathcal{G}(k+T-1)\}$ for any $i,j\in\mathcal{V}$ that satisfies $\bigcup\nolimits_{k=0}^{T}{\left\{{{i_{k}}}\right\}}=\mathcal{V}$ ; here by a route over a given sequence of graphs $\{\mathcal{G}(1),\mathcal{G}(2),\dots,\mathcal{G}(k)\}$ , we mean a sequence of vertices $i_{0},i_{1},\dots,i_{k}$ such that $(i_{j-1},i_{j})$ is an edge in $\mathcal{G}(z)$ for all $1\leq z\leq k$ . Now we are ready to prove Theorem 7.

Proof:

Let $V(e_{k})=\left[\kern-1.49994pt\left[\left.{e_{k}}\right]\kern-1.49994pt\right]\right.\;$ be a finite-step stochastic Lyapunov function candidate. Let $\{\mathcal{F}_{k}\}$ , where $\mathcal{F}_{k}=\sigma(\mathcal{G}(1),\cdots,\mathcal{G}(k),\cdots)$ , be an increasing sequence of $\sigma$ -fields. We first show that $V(e_{k})$ is a supermartingale with respect to $\mathcal{F}_{k}$ by observing

[TABLE]

where $\Phi_{k}=\Phi(k,k)=P(W(k)\otimes I)Pe_{k}$ . The last inequality follows from the fact that $\mathbb{E}\left[\kern-1.49994pt\left[\left.{\Phi_{k}}\right]\kern-1.49994pt\right]\right.\;\leq 1$ since all the possible $\Phi_{k}$ are non-expansive. Consider the sequence of randomly changing graphs $\{\mathcal{G}(1),\mathcal{G}(2),\cdots,\mathcal{G}(q)\}$ , where $q=(n-1)^{2}l$ . Let $r=n-1$ , and partition this sequence into $r$ successive subsequences $\mathcal{G}_{1}=\{\mathcal{G}(1),\dots,\mathcal{G}(rl)\}$ , $\mathcal{G}_{2}=\{\mathcal{G}(rl+1),\dots,\mathcal{G}(2rl)\}$ , $\cdots$ , $\mathcal{G}_{r}=\{\mathcal{G}((r-1)l+1),\dots,\mathcal{G}(r^{2}l)\}$ . Let $\mathbb{C}_{z}$ denote the composition of the graphs in the $z$ th subsequence, i.e., $\mathbb{C}_{z}=\mathcal{G}\left({zl}\right)\circ\cdots\circ\mathcal{G}\left({(z-1)l+2}\right)\circ\mathcal{G}\left({(z-1)l+1}\right),z=1,2,\dots,r$ . Since all the subsequences have the length $rl$ , each can be further partitioned into $r$ successive sub-subsequences of length $l$ . From the condition of Theorem 7, one knows that the composition of the graphs in any sub-subsequence has positive probability to be strongly connected. The event that the composition of the graphs in each of the $r$ sub-subsequences in $\mathcal{G}_{z}$ is strongly connected also has positive probability. This holds for all $z$ . We know that the composition of any $r$ or more strongly connected graphs, within which each vertex has a self-arc, results in a complete graph [20]. It follows straightforwardly that the graphs $\mathbb{C}_{1},\cdots,\mathbb{C}_{r}$ have positive probability to be all complete. Therefore, for any pair $i,j\in\mathcal{V}$ , there exists a route from $j$ to $i$ over the graph $\mathbb{C}_{z}$ for any $z$ . It is easy to check that there exists a route $i_{1},i_{2},\dots,i_{n}$ over the graphs $\mathbb{C}_{1},\cdots,\mathbb{C}_{r}$ , where $i_{1},i_{2},\dots,i_{n}$ can be any reordered sequence of $\{1,2,\dots,n\}$ . Similarly, for any $x$ there must exist a route of length $rl$ , $i_{z}=i_{z}^{1},i_{z}^{2},\dots,i_{z}^{rl}=i_{z+1}$ , over $\mathcal{G}_{z}$ . Thus there is a route $i_{1}^{1},i_{1}^{2},\ldots,i_{1}^{rl},i_{2}^{2},\ldots,i_{2}^{rl}\ldots,i_{r}^{rl}$ over the graph sequence $\{\mathcal{G}(1),\mathcal{G}(2),\cdots,\mathcal{G}(q)\}$ so that $\bigcup\nolimits_{\delta=1}^{r}{\bigcup\nolimits_{\theta=1}^{rl}{\left\{{i_{\delta}^{\theta}}\right\}}}={\mathcal{V}}$ . This implies that the probability that $\Phi(q,1)$ being a contraction is positive. Since all $\Phi(q,1)$ are non-expansive, there is a number $\rho(1)<1$ such that $\mathbb{E}\left[\kern-1.49994pt\left[\left.{\Phi(q,1)}\right]\kern-1.49994pt\right]\right.\;=\rho(1)$ . Straightforwardly, it also holds $\mathbb{E}\left[\kern-1.49994pt\left[\left.{\Phi(k+q,k)}\right]\kern-1.49994pt\right]\right.\;=\rho(k)<1$ for all $k<\infty$ . Thus there a.s. holds that

[TABLE]

Similarly as in the proof of Theorem 3, the condition b) in Theorem 7 ensures that $\sum\nolimits_{i=1}^{\infty}(1-\rho(k))=\infty$ . It follows that $V\left(e_{k}\right)\stackrel{{\scriptstyle a.s.}}{{\longrightarrow}}0$ as $t\to\infty$ since $V(e_{0})-\mathbb{E}\big{[}{\left.{V\left({e_{nq}}\right)}\right|{\mathcal{F}_{k}}}\big{]}<\infty$ for any $N$ . Define the set $\mathcal{Q}:=\{e:V(e)\leq V(e_{1})\}$ for any initial $e_{1}$ corresponding to $x_{1}$ . For any random sequence $\{\mathcal{G}(k)\}$ , it follows from the system dynamics (32) that

[TABLE]

and thus $e_{k}$ will stay within the set $\mathcal{Q}$ with probability $1$ . From Theorem 1 and Corollary 1, it follows that $e_{k}$ asymptotically converges to $\{e:V(e)=0\}$ almost surely. Moreover, since $V(e)$ is a norm of $e$ , it can be concluded from Corollary 1 that the error system (32) is globally a.s. asymptotically stable. The proof is complete. ∎

It is worth mentioning that the error system is globally a.s. exponentially stable under the assumption that the probability of the composition of any sequence of randomly-changing graphs, $\{\mathcal{G}(k),\dots,\mathcal{G}(k+1),\mathcal{G}(k+l-1)\}$ , for any $k\geq 0$ , being strongly connected is lower bounded by some positive number. This can be proven with the help of Theorem 2 and Corollary 2.

VI Concluding Remarks

We have established the tool of finite-step stochastic Lyapunov functions, using which one can study the convergence and stability of a stochastic system together with its convergence rate. As applications, we investigate the convergence of the products of a random sequence of stochastic matrices. The asynchronous agreement problem and the distributed algorithm for solving linear algebraic equations have also been studied. Conditions in the existing results on both of these problems have been relaxed. One of our future research directions is to apply finite-step stochastic Lyapunov functions to the study of stochastic distributed optimization.

VII Acknowledgement

We thank Prof. Tobias Müller from Bernoulli Institute, University of Groningen, for constructive discussions.

Appendix A An Alternative Proof of Corollary 5

For ergodic stationary sequences, the following important property is the key to construct the convergence rate.

Lemma 6 (Birkhoff’s Ergodic Theorem, see [36, Th. 7.2.1]).

For an ergodic sequence $\{X_{k}\},k\in\mathbb{N}_{\geq 0}$ , of random variables, it holds that

[TABLE]

For the product given in (9), we say $W(k,0)$ converges to a rank-one matrix $W=1\xi^{\top}$ a.s. as $k\to\infty$ if $\tau(W(k,0))\to 0$ as $k\to\infty$ , where $\tau(\cdot)$ is defined in (10). According to Definition 1, if there exist $\beta>1$ such that

[TABLE]

then the convergence rate is said to be exponential at the rate no slower than $\beta^{-1}$ . We are now ready to present the proof of Corollary 5.

Proof of Corollary 5.

Let $h$ be the same as that in Assumption 2. There is an integer $\theta\in\mathbb{N}$ such that $W(t+\theta h,t)$ is scrambling with positive probability. Let $T=\theta h$ . Consider a sufficiently large $r$ , and then $W(r,0)$ can be written as

[TABLE]

where $m$ is the largest integer such that $mT\leq r$ , $W\left({kT+T,kT}\right),k=0,\cdots,m-1$ , are the matrix products defined by (9), and ${\bar{W}}=W(r,mT)$ is the remaining part, which is obviously a stochastic matrix. To study the limiting behavior of $W(r,0)$ , we compute its coefficients of ergodicity

[TABLE]

where the property (12) has been used. The last inequality follows from the property of coefficients of ergodicity, i.e., $\tau(A)\leq 1$ for a stochastic matrix $A$ . Taking logarithms yields that

[TABLE]

Since the sequence $\{W(k)\}$ is ergodic, it is easy to see that the sequence of products $\{W\left({kT+T,kT}\right)\}$ , $k=0,\cdots,m-1$ , over non-overlapping intervals of length $T$ , is also ergodic. It follows in turn that $\{\log\tau\big{(}W\left({kT+T,kT}\right)\big{)}\}$ is ergodic. From Lemma 6, one can further obtain

[TABLE]

The last inequality follows from Jensen’s inequality (see [36, Th. 1.5.1]) since $\log(\cdot)$ is concave. According to Assumption 1, one knows that $W(t+h,t)$ is scrambling with positive probability, and thus it follows that $0<\mathbb{E}\big{[}\tau\left({W\left({{T},0}\right)}\right)\big{]}<1$ . Taking a positive number $\lambda$ satisfying $\lambda<-\log\mathbb{E}\big{[}\tau\big{(}{W\left({{T},0}\right)}\big{)}\big{]}$ , one obtains

[TABLE]

Adding $m\lambda$ to both sides of (35) yields that

[TABLE]

It follows straightforwardly that

[TABLE]

Let $\beta=e^{\lambda}$ , which apparently satisfies $\beta>1$ . From Definition 1, one can conclude that the product $W(k,0)$ almost surely converges to a rank-one stochastic matrix exponentially at a rate no slower than $\beta^{-1}$ , which completes the proof. ∎

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. J. Kushner, Stochastic Stability and Control . New York, NY, USA: Academic Press, 1967.
2[2] ——, Introduction to Stochastic Control . New York: Holt, Rinehart and Winston, Inc., 1971.
3[3] ——, “On the stability of stochastic dynamical systems,” Proceedings of the National Academy of Sciences , vol. 53, no. 1, pp. 8–12, 1965.
4[4] F. J. Beutler, “On two discrete-time system stability concepts and supermartingales,” Journal of Mathematical Analysis and Applications , vol. 44, no. 2, pp. 464–471, 1973.
5[5] R. Khasminskii, Stochastic Stability of Differential Equations . Springer Science & Business Media, 2011.
6[6] M. Porfiri and D. J. Stilwell, “Consensus seeking over random weighted directed graphs,” IEEE Trans. Autom. Control , vol. 52, no. 9, pp. 1767–1773, 2007.
7[7] A. Tahbaz-Salehi and A. Jadbabaie, “Consensus over ergodic stationary graph processes,” IEEE Trans. Autom. Control , vol. 55, no. 1, pp. 225–230, 2010.
8[8] S. Lee, A. Nedić, and M. Raginsky, “Stochastic dual averaging for decentralized online optimization on time-varying communication graphs,” IEEE Trans. Autom. Control , vol. 62, no. 12, pp. 6407–6414, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Lyapunov Criterion for Stochastic Systems and Its Applications in Distributed Computation

Abstract

I Introduction

II Finite-Step Stochastic Lyapunov Functions

Definition 1** (Convergence).**

Definition 2**.**

Definition 3**.**

Lemma 1** (Asymptotic Convergence and Stability).**

Lemma 2** (Exponential Convergence and Stability).**

Lemma 3** ([36, Sec. 5.2.9]).**

Lemma 4** (Borel-Cantelli Lemma, [2, P.192]).**

Theorem 1**.**

Proof.

Corollary 1**.**

Theorem 2**.**

Proof.

Corollary 2**.**

Corollary 3**.**

Remark 1**.**

III Products of Random Sequences of Stochastic Matrices

III-A Convergence Results

Assumption 1**.**

Theorem 3**.**

Proposition 1**.**

Proof.

Proposition 2**.**

Proof:

Proof:

Corollary 4**.**

Assumption 2**.**

Theorem 4**.**

Proof:

Remark 2**.**

III-B Estimation of Convergence Rate

Theorem 5**.**

Proof:

Corollary 5**.**

III-C Connection to Markov Chains

Lemma 5** ([37, Th. 4.10], [40]).**

IV Asynchronous Agreement over Possibly Periodic Networks

Definition 4**.**

Assumption 3**.**

Theorem 6**.**

Definition 5**.**

Proposition 3**.**

Proof:

Proposition 4**.**

Proof:

Remark 3**.**

V To Solve Linear Algebraic Equations

Theorem 7**.**

Proof:

VI Concluding Remarks

VII Acknowledgement

Appendix A An Alternative Proof of Corollary 5

Lemma 6** (Birkhoff’s Ergodic Theorem, see [36, Th. 7.2.1]).**

Proof of Corollary 5.

Definition 1 (Convergence).

Definition 2.

Definition 3.

Lemma 1 (Asymptotic Convergence and Stability).

Lemma 2 (Exponential Convergence and Stability).

Lemma 3 ([36, Sec. 5.2.9]).

Lemma 4 (Borel-Cantelli Lemma, [2, P.192]).

Theorem 1.

Corollary 1.

Theorem 2.

Corollary 2.

Corollary 3.

Remark 1.

Assumption 1.

Theorem 3.

Proposition 1.

Proposition 2.

Corollary 4.

Assumption 2.

Theorem 4.

Remark 2.

Theorem 5.

Corollary 5.

Lemma 5 ([37, Th. 4.10], [40]).

Definition 4.

Assumption 3.

Theorem 6.

Definition 5.

Proposition 3.

Proposition 4.

Remark 3.

Theorem 7.

Lemma 6 (Birkhoff’s Ergodic Theorem, see [36, Th. 7.2.1]).