Invariance Properties of Controlled Stochastic Nonlinear Systems under   Information Constraints

Christoph Kawan; Serdar Y\"uksel

arXiv:1901.02825·math.OC·May 5, 2020

Invariance Properties of Controlled Stochastic Nonlinear Systems under Information Constraints

Christoph Kawan, Serdar Y\"uksel

PDF

TL;DR

This paper investigates the limits of stabilizing stochastic nonlinear systems over communication channels by developing new entropy-based bounds using ergodic theory, enhancing understanding of information requirements for stability.

Contribution

It introduces a novel ergodic-theoretic approach and a new entropy concept to derive refined bounds on information transmission needed for system stabilization.

Findings

01

Derived fundamental bounds on communication requirements for stability.

02

Developed a new entropy measure tailored for AMS analysis.

03

Provided more versatile and refined bounds compared to previous methods.

Abstract

Given a stochastic nonlinear system controlled over a possibly noisy communication channel, the paper studies the largest class of channels for which there exist coding and control policies so that the closed-loop system is stochastically stable. The stability criterion considered is asymptotic mean stationarity (AMS). We develop a general method based on ergodic theory and probability to derive fundamental bounds on information transmission requirements leading to stabilization. Through this method we develop a new notion of entropy which is tailored to derive lower bounds for asymptotic mean stationarity for both noise-free and noisy channels. The bounds obtained through probabilistic and ergodic-theoretic analysis are more refined in comparison with the bounds obtained earlier via information-theoretic methods. Moreover, our approach is more versatile in view of the models considered…

Equations412

h_{inv} (Q) := τ \to \infty lim \frac{1}{τ} lo g r_{inv} (τ, Q),

h_{inv} (Q) := τ \to \infty lim \frac{1}{τ} lo g r_{inv} (τ, Q),

(θ \overset{x}{ˉ})_{t} = x_{t + 1} \mbox f or a l l t \in Z_{+}, \overset{x}{ˉ} \in X^{Z_{+}} .

(θ \overset{x}{ˉ})_{t} = x_{t + 1} \mbox f or a l l t \in Z_{+}, \overset{x}{ˉ} \in X^{Z_{+}} .

x_{t + 1} = f (x_{t}, u_{t}, w_{t}), t = 0, 1, 2, \dots

x_{t + 1} = f (x_{t}, u_{t}, w_{t}), t = 0, 1, 2, \dots

C = lo g # M .

C = lo g # M .

P (F) = P ({ω \in Ω : (x_{t} (ω))_{t \in Z_{+}} \in F}),

P (F) = P ({ω \in Ω : (x_{t} (ω))_{t \in Z_{+}} \in F}),

T \to \infty lim \frac{1}{T} t = 0 \sum T - 1 P (θ^{- t} F) = \overset{ˉ}{P} (F) \mbox f or a l l F \in B ((R^{N})^{Z_{+}}) .

T \to \infty lim \frac{1}{T} t = 0 \sum T - 1 P (θ^{- t} F) = \overset{ˉ}{P} (F) \mbox f or a l l F \in B ((R^{N})^{Z_{+}}) .

T \to \infty lim \frac{1}{T} t = 0 \sum T - 1 P (x_{t} \in A) = Q (A)

T \to \infty lim \frac{1}{T} t = 0 \sum T - 1 P (x_{t} \in A) = Q (A)

\frac{1}{T} # {t \in [0; T - 1] : φ (t, x_{0} (ω), \overset{u}{ˉ}, \overset{w}{ˉ} (ω)) \in B} \geq 1 - r .

\frac{1}{T} # {t \in [0; T - 1] : φ (t, x_{0} (ω), \overset{u}{ˉ}, \overset{w}{ˉ} (ω)) \in B} \geq 1 - r .

h_{B} (ρ, r) := T \to \infty lim sup \frac{1}{T} lo g s_{B} (T, ρ, r) .

h_{B} (ρ, r) := T \to \infty lim sup \frac{1}{T} lo g s_{B} (T, ρ, r) .

C \geq h_{B} (\frac{1 + \frac{ε}{2}}{1 + ε}, (1 + ε) Q (B^{c})) .

C \geq h_{B} (\frac{1 + \frac{ε}{2}}{1 + ε}, (1 + ε) Q (B^{c})) .

C \geq h_{B} (\frac{1 + \frac{ε}{2}}{1 + ε}, (1 + ε) r) .

C \geq h_{B} (\frac{1 + \frac{ε}{2}}{1 + ε}, (1 + ε) r) .

T \to \infty lim \frac{1}{T} t = 0 \sum T - 1 P (x_{t} \in B^{c}) = 1 - Q (B) =: r .

T \to \infty lim \frac{1}{T} t = 0 \sum T - 1 P (x_{t} \in B^{c}) = 1 - Q (B) =: r .

\lim_{T\rightarrow\infty}E\Bigl{[}\frac{1}{T}\sum_{t=0}^{T-1}\mathds{1}_{B^{c}}(x_{t})\Bigr{]}=r.

\lim_{T\rightarrow\infty}E\Bigl{[}\frac{1}{T}\sum_{t=0}^{T-1}\mathds{1}_{B^{c}}(x_{t})\Bigr{]}=r.

E\Bigl{[}\frac{1}{T}\sum_{t=0}^{T-1}\mathds{1}_{B^{c}}(x_{t})\Bigr{]}\leq\left(1+\frac{\varepsilon}{2}\right)r,\quad\forall T\geq T_{0}.

E\Bigl{[}\frac{1}{T}\sum_{t=0}^{T-1}\mathds{1}_{B^{c}}(x_{t})\Bigr{]}\leq\left(1+\frac{\varepsilon}{2}\right)r,\quad\forall T\geq T_{0}.

\tilde{\Omega}_{T}:=\Bigl{\{}\omega\in\Omega\ :\ \frac{1}{T}\sum_{t=0}^{T-1}\mathds{1}_{B^{c}}(x_{t}(\omega))\leq(1+\varepsilon)r\Bigr{\}}

\tilde{\Omega}_{T}:=\Bigl{\{}\omega\in\Omega\ :\ \frac{1}{T}\sum_{t=0}^{T-1}\mathds{1}_{B^{c}}(x_{t}(\omega))\leq(1+\varepsilon)r\Bigr{\}}

\displaystyle P\Bigl{(}\frac{1}{T}\sum_{t=0}^{T-1}\mathds{1}_{B^{c}}(x_{t})>(1+\varepsilon)r\Bigr{)}

\displaystyle P\Bigl{(}\frac{1}{T}\sum_{t=0}^{T-1}\mathds{1}_{B^{c}}(x_{t})>(1+\varepsilon)r\Bigr{)}

\leq \frac{1 + \frac{ε}{2}}{1 + ε} = 1 - \frac{ε}{2 ( 1 + ε )} .

S_{T}:=\bigl{\{}\bar{u}_{[0,T-1]}(\omega)\in U^{T}:\omega\in\tilde{\Omega}_{T}\bigr{\}}

S_{T}:=\bigl{\{}\bar{u}_{[0,T-1]}(\omega)\in U^{T}:\omega\in\tilde{\Omega}_{T}\bigr{\}}

# S_{T} \leq (# M)^{T} .

# S_{T} \leq (# M)^{T} .

P (\tilde{Ω}_{T}) \geq \frac{ε}{2 ( 1 + ε )} = 1 - \frac{1 + \frac{ε}{2}}{1 + ε},

P (\tilde{Ω}_{T}) \geq \frac{ε}{2 ( 1 + ε )} = 1 - \frac{1 + \frac{ε}{2}}{1 + ε},

s_{B}\Bigl{(}T,\frac{1+\frac{\varepsilon}{2}}{1+\varepsilon},(1+\varepsilon)r\Bigr{)}\leq\#S_{T}\leq(\#\mathcal{M})^{T}\mbox{\quad for all\ }T\geq T_{0}.

s_{B}\Bigl{(}T,\frac{1+\frac{\varepsilon}{2}}{1+\varepsilon},(1+\varepsilon)r\Bigr{)}\leq\#S_{T}\leq(\#\mathcal{M})^{T}\mbox{\quad for all\ }T\geq T_{0}.

T \to \infty lim \frac{1}{T} t = 0 \sum T - 1 P (x_{t} \in B^{c}) = Q (B^{c}) = 0 \leq r

T \to \infty lim \frac{1}{T} t = 0 \sum T - 1 P (x_{t} \in B^{c}) = Q (B^{c}) = 0 \leq r

x_{t + 1} = f (x_{t}) + u_{t} + w_{t}

x_{t + 1} = f (x_{t}) + u_{t} + w_{t}

∣ det D f (x) ∣ \geq 1 \mbox f or a l l x \in R^{N} .

∣ det D f (x) ∣ \geq 1 \mbox f or a l l x \in R^{N} .

C \geq Q (B) lo g x \in B in f ∣ det D f (x) ∣.

C \geq Q (B) lo g x \in B in f ∣ det D f (x) ∣.

A

A

A (\overset{u}{ˉ})

\displaystyle\frac{1}{T}\#\left\{t\in[0;T-1]\ :\ \varphi(t,x,\bar{u},\bar{w})\in B\right\}\geq 1-r\Bigr{\}},

A (\overset{u}{ˉ}, \overset{w}{ˉ})

A \subset \overset{u}{ˉ} \in S ⋃ A (\overset{u}{ˉ})

A \subset \overset{u}{ˉ} \in S ⋃ A (\overset{u}{ˉ})

(ν^{Z_{+}} \times m) (A (\overset{u}{ˉ})) = \int ν^{Z_{+}} (d \overset{w}{ˉ}) m (A (\overset{u}{ˉ}, \overset{w}{ˉ})) .

(ν^{Z_{+}} \times m) (A (\overset{u}{ˉ})) = \int ν^{Z_{+}} (d \overset{w}{ˉ}) m (A (\overset{u}{ˉ}, \overset{w}{ˉ})) .

A (\overset{u}{ˉ}, \overset{w}{ˉ}, Λ)

A (\overset{u}{ˉ}, \overset{w}{ˉ}, Λ)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Invariance Properties of Controlled Stochastic Nonlinear Systems under Information Constraints111This research was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada. A brief presentation of some of the results in this paper will appear at the 2019 Information Theory Workshop, Visby, Sweden.

Christoph Kawan and Serdar Yüksel C. Kawan is with the Institute for Informatics at the Ludwig-Maximilians-Universität Munich, 80538 Munich, Germany (email: [email protected]). S. Yüksel is with the Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6 (e-mail: [email protected]). This research was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada. Some results of this paper were presented without proofs at the 2019 IEEE Information Theory Workshop.

Abstract

Given a stochastic nonlinear system controlled over a possibly noisy communication channel, the paper studies the largest class of channels for which there exist coding and control policies so that the closed-loop system is stochastically stable. The stability criterion considered is asymptotic mean stationarity (AMS). We develop a general method based on ergodic theory and probability to derive fundamental bounds on information transmission requirements leading to stabilization. Through this method we develop a new notion of entropy which is tailored to derive lower bounds for asymptotic mean stationarity for both noise-free and noisy channels. The bounds obtained through probabilistic and ergodic-theoretic analysis are more refined in comparison with the bounds obtained earlier via information-theoretic methods. Moreover, our approach is more versatile in view of the models considered and allows for finer lower bounds when the AMS measure is known to admit further properties such as moment bounds.

Index Terms:

Stochastic stabilization; asymptotic mean stationarity; measure-theoretic entropy; information theory

I Introduction

Consider the following problem: Given a stochastic nonlinear system controlled over a communication channel, what is the largest class of such channels so that there exist coding and control policies leading to (some form of) stochastic stability? Various versions of this problem have been studied extensively for (possibly stochastic) linear systems and deterministic nonlinear systems.

For deterministic nonlinear systems, invariance entropy [9] measures the smallest average data rate of a noiseless channel above which a compact subset $Q$ of the state space can be made invariant by a controller receiving its state information through this channel. The essence of the idea behind this concept is as follows: If the controller has $n$ bits of information available, it can distinguish at most $2^{n}$ different states, hence generate at most $2^{n}$ different control inputs. Consequently, the number of control inputs needed to achieve the control objective (on a finite time interval) is a measure for the necessary information. The definition of invariance entropy thus reads

[TABLE]

where $r_{\mathrm{inv}}(\tau,Q)$ is the minimal number of control inputs needed to achieve invariance of $Q$ on the time interval $[0,\tau]$ for arbitrary initial states in $Q$ . It is relatively immediate to observe that the growth rate of $r_{\mathrm{inv}}(\tau,Q)$ is directly related to the rate of volume expansion for subsets of $Q$ under the evolution of the system. Indeed, the faster volume is expanded, the more coding regions, and hence different control inputs, are necessary to keep the whole volume inside $Q$ . Since for every reasonable stabilization objective it is necessary to keep certain volumes bounded (or even shrink them to zero), the same ideas as used in the definition of invariance entropy should work universally for stabilization over discrete channels. This intuition was rigorously verified in a number of publications, including [7, 9, 11, 14, 24, 25].

In this paper, we demonstrate that such an approach is also applicable, by means of the machinery we develop, to stochastic systems, stochastic channels, and to stochastic stability. Our criterion for stochastic stability is asymptotic mean stationarity (AMS), introduced by Gray & Kieffer [19] and used in networked control in a number of publications [48, 50, 51]. This concept considerably weakens the notion of stationarity and is closely related to other criteria used in the literature, such as stability in probability [33], (unique) ergodicity [48], as well as another commonly used stability criterion: finite $m$ -th moment stability for various $m\in\mathbb{N}$ [36, 38, 44]. The AMS property is weaker than unique ergodicity, and the finite-moment stability typically implies the AMS property provided additional regularity properties are imposed. Nonetheless, the AMS property is a very versatile notion; if one assumes that the support of the asymptotic mean measure is compact, the AMS property can be related to set stability; if one assumes that this measure has a finite $m$ -th moment for its coordinate state process, the AMS property would lead to the finite-moment stability property, and finally the ergodicity can also be imposed for certain applications through mixing properties, e.g., through the construction of a positive Harris recurrent Markov chain [49]. Barron [5] and Gray & Kieffer [19] note various other operational utilities of the AMS property.

As an auxiliary quantity to derive lower bounds on the necessary channel capacity for generating an AMS state process, we introduce a new concept of stabilization entropy inspired by both invariance entropy and measure-theoretic entropy of dynamical systems, in particular by a characterization of the latter due to Katok [23] and a generalization thereof developed in Ren et al. [43]. Roughly speaking, stabilization entropy looks at the exponential growth rate of the number of length- $n$ control sequences necessary to keep the state inside some set for a certain fraction of the number $n$ of times with a certain positive probability. The corresponding set, the frequency of times and the probability are parameters that can be adjusted, and the relation to channel capacity can only be established for certain choices of these parameters.

Stochastic stabilization of nonlinear systems driven by noise (especially unbounded noise) over communication channels has been studied in few publications, notably in [51]. With our method we are able to refine the bounds presented in [51]. The approach developed in our paper, unlike the differential-entropic methods in [51] and other publications, allows for

(i)

refined stochastic stability results applicable to a more general class of system models (Theorems V.1 and VI.2). and more refined stability criteria such as the AMS property in combination with moment conditions (see Corollary V.4),

(ii)

a more concise and direct derivation, building on volume growth arguments, applicable to a plethora of criteria,

(iii)

more refined bounds for a large class of systems through trading-off growth rates with the measures of sets under the coordinate projection of a stationary measure (see Theorem V.1),

(iv)

the unification of the theory developed for deterministic systems controlled over noise-free communication channels with their stochastic counterparts, involving both stochastic nonlinear dynamical systems and noisy communication channels (see Theorem VII.1).

In the paper at hand, explicit lower bounds on the capacity in terms of characteristics of the system are derived for nonlinear volume-expanding systems with additive control and noise, and for a class of inhomogeneous semi-linear systems with nonlinear dependence on the control variable. For the first class of systems, we obtain a particularly interesting result which displays a trade-off between the volume-expansion rate of the system and the mass distribution of the probability measure coming from the AMS property. This trade-off is a specific feature of nonlinear systems, since in the linear case the influence of the measure is canceled out due to the fact that the Jacobian matrix with respect to the state is a constant in this case. From our results we can easily recover the well-known capacity bound for linear systems, $\sum_{\lambda}\max\{0,\log|\lambda|\}$ (summing over all eigenvalues of the dynamical matrix), and also previous bounds for nonlinear systems proved via information-theoretic methods.

We emphasize that for the case of noisy channels, at least for a simple class of scalar systems, we are able to derive similar lower bounds as for noiseless channels via relating the number of control sequences needed for stabilization to a state estimation problem, and then by a generalization of the strong converse to the channel coding problem in information theory together with optimal transport theory, relating the channel capacity to a state estimation problem. This approach, in particular, allows for replacing arguments which depend on the maximum number of possible distinct message sequences for noiseless channels with an entropy-theoretic argument. It is our hope that this novel method will also be accessible to a general readership and find further applications.

The paper is organized as follows. In Section II we provide a short literature review. The technical details of the stabilization problem are outlined in Section III. The subsequent Section IV introduces the notion of stabilization entropy. Applications to specific system models are given in Sections V and VI, and Section VII contains our result for noisy channels. Finally, the proofs of two technical lemmas are given in the Appendix.

II A brief literature review

This paper continues along the research programs developed in [24], which considers deterministic systems, and [51], which considers stochastic systems. For comprehensive literature reviews on the subject, we refer to [24, 33, 48]. Here we only provide a short review of the most relevant contributions.

For noise-free linear systems controlled over discrete noiseless channels, various authors have obtained a formula for the smallest channel capacity above which stabilization is possible, under various assumptions on the system and the admissible coders and controllers. This result is usually referred to as a data-rate theorem and asserts that the smallest capacity is given by the logarithm of the unstable determinant of the open-loop system, i.e., the log-sum of the unstable eigenvalues. The earliest works in this context are Wong & Brockett [6] and Baillieul [3]. More general versions of the data-rate theorem have been proven in Tatikonda & Mitter [46] and Hespanha et al. [21]. For noisy systems and mean-square stabilization, or more generally, moment-stabilization, analogous data-rate theorems have been proven in Nair & Evans [38] and Sahai & Mitter [44], see also [32, 34]. For extensive reviews, see [2, 17, 33, 40, 48]. A data-rate theorem for AMS stability of linear systems was established in [48, Thm. 8.5.3] (see also [51, Thm. 3.1]) and [50, Thm. 4.1 and 4.2], [22, Thm. 2.2, 3.2 and 3.5] under various variations. A recent study along a similar construction to the one introduced in [52] and [49] under fixed-rate quantization is [29].

The studies of nonlinear systems have typically considered deterministic systems that are noise-free systems controlled over discrete noiseless channels. In this context, Nair et al. [39] introduced the notion of topological feedback entropy (in analogy to topological entropy for dynamical systems [1]) for discrete-time systems to characterize the smallest average rate of information above which the state can be kept inside a compact controlled invariant set. They also characterized the smallest data rate for stabilization to an equilibrium point as the log-sum of the unstable eigenvalues of the linearization. Colonius & Kawan in [9] introduced the notion of invariance entropy for continuous-time systems for the same stabilization objective. When adapted to the same (discrete-time) setting, the two notions are equivalent, see [11]. A comprehensive review of these concepts is provided in [24]. We also note that recently a concept of metric invariance entropy based on conditionally invariant measures was established in [8]. Further studies on control of nonlinear systems over communication channels have focused on constructive schemes (and not on converse theorems), primarily for noise-free systems and channels, see, e.g., [4, 16, 30].

We also emphasize that for nonlinear systems the problems of local stabilization (stabilization to a point), semi-global stabilization (set invariance) and global stabilization (as in the stochastic stabilization criterion considered here) are fundamentally different from each other, while for linear systems they can all be handled with similar methods, leading to the above-mentioned data-rate theorem in each case. This is related to the fact that for linear systems any local (dynamical or control-theoretic) property is a global property as well. For nonlinear systems, linearization techniques work well for local problems, for semi-global problems only under specific assumptions and for global problems almost not at all. In addition, the presence of (possibly unbounded and additive) noise requires an approach fundamentally different from the machinery utilized for local stabilization problems.

III Preliminaries and problem description

Notation

If $A$ is a finite set, we write $\#A$ for its cardinality. The complement of a set $A\subset X$ is denoted by $A^{c}=X\backslash A$ . We write $\mathds{1}_{A}$ for the indicator function of a set $A$ . By $\log$ we always denote the base- $2$ -logarithm. We write $\mathbb{Z}_{+}$ for the set of nonnegative integers and put $\mathbb{Z}_{>0}:=\mathbb{Z}_{+}\backslash\{0\}$ . Moreover, we use the notation $[a;b]$ for a discrete interval, i.e., $[a;b]=\{a,a+1,\ldots,b\}$ for any $a,b\in\mathbb{Z}$ with $a\leq b$ . By $|\cdot|$ we denote the standard Euclidean norm on $\mathbb{R}^{N}$ and by $\|\cdot\|$ any associated operator norm. We write $B_{r}(x)=\{y\in\mathbb{R}^{N}:|x-y|<r\}$ for $x\in\mathbb{R}^{N}$ , $r>0$ , and denote by $\overline{A}$ the closure of a set $A\subset\mathbb{R}^{N}$ . The Lebesgue measure on $\mathbb{R}^{N}$ is denoted by $m$ . We write $I$ for the $N\times N$ -identity matrix and $\mathrm{Gl}(N,\mathbb{R})$ for the general linear group of $\mathbb{R}^{N}$ . By $\mathcal{L}(V,W)$ we denote the space of all linear maps between vector spaces $V,W$ . We use the notation $\mathrm{supp}(\mu)$ for the support of a Borel probability measure $\mu$ . The expectation of a random variable $X$ is denoted by $E[X]$ . The entropy of a $\{0,1\}$ -valued Bernoulli random variable $X$ with $P(X=0)=r$ is denoted by $H(r)$ , i.e., $H(r)=-r\log r-(1-r)\log(1-r)$ . The relative entropy of two probability mass functions $p(x)$ and $q(x)$ on a discrete space $\mathbb{X}$ is defined by $D(p||q):=\sum_{x\in\mathbb{X}}p(x)\log\frac{p(x)}{q(x)}$ . We refer the reader to [12] for further information-theoretic concepts such as mutual information and channel capacity.

If $\mu,\nu$ are two measures on the same measurable space, we write $\mu\ll_{b}\nu$ to denote that $\mu$ is absolutely continuous with respect to $\nu$ and its density is essentially bounded.

If $X^{\mathbb{Z}_{+}}$ is the set of all sequences in some set $X$ , we write $\bar{x}=(x_{t})_{t\in\mathbb{Z}_{+}}$ for elements of $X^{\mathbb{Z}_{+}}$ and $\theta:X^{\mathbb{Z}_{+}}\rightarrow X^{\mathbb{Z}_{+}}$ for the left shift operator, i.e.,

[TABLE]

Moreover, we write $\bar{x}_{[0,t]}=(x_{0},x_{1},\ldots,x_{t})$ for $t\in\mathbb{Z}_{+}$ and $\mathcal{B}(X)$ for the Borel $\sigma$ -field of a topological space $X$ .

To avoid technical problems concerning the measurability of certain sets, we make the following general assumption.

Assumption III.1

We assume that all measurable spaces in this paper are standard Borel and all random variables associated with a given control system are modeled on a common (standard Borel) probability space $(\Omega,\mathcal{F},P)$ .

The standard Borel space assumption leads to useful universal measurability properties which are utilized in the paper. A measurable image of a Borel set is called an analytic set [15, App. 2]. We note that this is evidently equivalent to the seemingly more restrictive condition of being a continuous image of a Borel set. The following property will be utilized in our analysis: The image of a Borel set under a measurable map, and hence an analytic set, is universally measurable [15].

Throughout the paper, we consider a stochastic control system

[TABLE]

This defines a measurable map $f:\mathbb{R}^{N}\times U\times W\rightarrow\mathbb{R}^{N}$ , where $\mathbb{R}^{N}$ is endowed with the Borel $\sigma$ -field $\mathcal{B}(\mathbb{R}^{N})$ , $(U,\mathcal{F}_{U})$ is a measurable space and $(W,\mathcal{F}_{W},\nu)$ a probability space. The noise is modeled by an i.i.d. sequence $(w_{t})_{t\in\mathbb{Z}_{+}}$ of random variables on $(W,\mathcal{F}_{W})$ with associated probability measure $\nu$ . The initial state $x_{0}$ is modeled by another random variable with probability measure $\pi_{0}$ on $(\mathbb{R}^{N},\mathcal{B}(\mathbb{R}^{N}))$ and is assumed to be independent of $(w_{t})_{t\in\mathbb{Z}_{+}}$ .

We write $\varphi(t,x_{0},\bar{u},\bar{w})$ , $t\in\mathbb{Z}_{+}$ , for the unique trajectory with initial value $x_{0}\in\mathbb{R}^{N}$ associated with the noise realization $\bar{w}\in W^{\mathbb{Z}_{+}}$ and the control sequence $\bar{u}\in U^{\mathbb{Z}_{+}}$ .

We assume that an encoder, knowing the states $x_{0},x_{1},\ldots,x_{t}$ at time $t\in\mathbb{Z}_{+}$ , transmits at time $t\in\mathbb{Z}_{+}$ a symbol $q_{t}$ through a noiseless discrete channel to a decoder/controller. We assume that the decoder receives the signals without delay. The finite coding alphabet is denoted by $\mathcal{M}$ and the capacity of the channel is

[TABLE]

Thus, at time $t$ , the controller has the symbol string $q_{[0,t]}=(q_{0},q_{1},\ldots,q_{t})\in\mathcal{M}^{t+1}$ available to generate the control input $u_{t}$ . Any coding and control policy of this form is called a causal coding and control policy. A more general setup including a noisy channel will be introduced and studied in Section VII.

The considered control objective is to make the state process $(x_{t})_{t\in\mathbb{Z}_{+}}$ asymptotically mean stationary (AMS). Writing ${\mathbf{P}}$ for the process measure on $(\mathbb{R}^{N})^{\mathbb{Z}_{+}}$ , i.e.,

[TABLE]

the process $\{x_{t}\}_{t\in\mathbb{Z}_{+}}$ is AMS if there is a probability measure $\bar{P}$ on $\mathcal{B}((\mathbb{R}^{N})^{\mathbb{Z}_{+}})$ with

[TABLE]

This implies that $\bar{P}$ is a stationary measure for $(x_{t})$ , i.e., $\bar{P}(\theta^{-t}F)=\bar{P}(F)$ for all times $t$ and Borel sets $F$ .

The AMS property implies the existence of a probability measure $Q$ on $(\mathbb{R}^{N},\mathcal{B}(\mathbb{R}^{N}))$ so that

[TABLE]

for every $A\in\mathcal{B}(\mathbb{R}^{N})$ . This can be seen by considering sets of the form $F=A\times\mathbb{R}^{N}\times\mathbb{R}^{N}\times\cdots$ . Then ${\mathbf{P}}(\theta^{-t}F)$ reduces to $P(x_{t}\in A)$ and the measure $Q$ is given by $Q(A)=\bar{P}(F)$ .

We note that it was shown in [51, Thm. 5.1] that an additive noise system can be made AMS over a finite-capacity channel under mild assumptions. Thus, searching for lower bounds on the necessary channel capacity is a meaningful problem.

IV Stabilization entropy

Definition IV.1

For any Borel set $B\subset\mathbb{R}^{N}$ , $T\in\mathbb{Z}_{>0}$ and $\rho,r\in(0,1)$ , a set $S\subset U^{T}$ is called $(T,B,\rho,r)$ -spanning if there exists a set $\tilde{\Omega}\in\mathcal{F}$ with $P(\tilde{\Omega})\geq 1-\rho$ so that for every $\omega\in\tilde{\Omega}$ there is $\bar{u}\in S$ with

[TABLE]

We write $s_{B}(T,\rho,r)$ for the smallest cardinality of a $(T,B,\rho,r)$ -spanning set (where $s_{B}(T,\rho,r)=\infty$ if no finite $(T,B,\rho,r)$ -spanning set exists) and define the $(B,\rho,r)$ -stabilization entropy of system (1) by

[TABLE]

Some remarks about this definition are in order:

(i) The control sequences $\bar{u}$ in the above definition are not generated by a coding and control policy. Indeed, $h_{B}(\rho,r)$ is an intrinsic quantity of the open-loop system.

(ii) The existence and finiteness of $(T,B,\rho,r)$ -spanning sets is not immediately clear from the definition. However, as we will see below, in relevant cases this is guaranteed. In general, we always have $0\leq h_{B}(\rho,r)\leq\infty$ .

(iii) There are some obvious monotonicity properties of the function $h_{B}(\cdot,\cdot)$ . Namely, if $r$ or $\rho$ become smaller, $h_{B}(\rho,r)$ increases. This in particular implies the existence of corresponding limits as $r\rightarrow 0$ and $\rho\rightarrow 0$ (which may be infinite).

(iv) The notion of $(B,\rho,r)$ -stabilization entropy is defined in close analogy to the notion of measure-theoretic $r$ -entropy [42, 43] for dynamical systems. This quantity generalizes the classical Kolmogorov-Sinai measure-theoretic entropy, on the basis of its characterization due to Katok [23] for ergodic measures. While the original definition of measure-theoretic entropy is based on computing the Shannon entropy of “dynamical partitions”, Katok’s characterization is based on counting the minimal number of “dynamical balls” of a certain radius needed to cover a subset of the state space with measure greater than some threshold.

We now present our key lemma which relates the channel capacity necessary for stabilization to the stabilization entropy. In particular, it shows that finite $(T,B,\rho,r)$ -spanning sets exist for appropriate choices of $B,\rho,r$ , provided that the AMS property can be achieved.

Lemma IV.2

Assume that the AMS property is achieved via a causal coding and control policy over a noiseless channel of capacity $C$ . Then for every Borel set $B\subset\mathbb{R}^{N}$ with $0<Q(B)<1$ and all sufficiently small $\varepsilon>0$ we have

[TABLE]

If $Q(B)=1$ , then for all $r\in(0,1)$ and $\varepsilon>0$ sufficiently small we have

[TABLE]

Proof:

We fix a causal coding and control policy which achieves the AMS property over a noiseless channel of capacity $C$ . For a given set $B\in\mathcal{B}(\mathbb{R}^{N})$ with $0<Q(B)<1$ , (2) implies

[TABLE]

Since $P(x_{t}\in B^{c})=E[\mathds{1}_{B^{c}}(x_{t})]$ , this can also be written as

[TABLE]

We pick $\varepsilon\in(0,(1-r)/r)$ and choose $T_{0}>0$ so that

[TABLE]

By Markov’s inequality, this implies that for every $T\geq T_{0}$ the event

[TABLE]

occurs with probability $P(\tilde{\Omega}_{T})\geq\varepsilon/(2(1+\varepsilon))$ , since

[TABLE]

Observe that for every $\omega\in\tilde{\Omega}_{T}$ the number of $t$ ’s in $[0;T-1]$ satisfying $x_{t}(\omega)\in B^{c}$ is $\leq(1+\varepsilon)rT$ . Now for every $T\geq T_{0}$ consider the set

[TABLE]

of control sequences generated in the time interval $[0;T-1]$ provided that $\omega\in\tilde{\Omega}_{T}$ and $x_{0}=x_{0}(\omega)$ , $\bar{w}=\bar{w}(\omega)$ . Since the maximal number of different messages that can be transmitted in the time interval $[0;T-1]$ is $(\#\mathcal{M})^{T}$ , we have

[TABLE]

We claim that $S_{T}$ is $(T,B,\frac{1+\varepsilon/2}{1+\varepsilon},(1+\varepsilon)r)$ -spanning. Indeed,

[TABLE]

for every $\omega\in\tilde{\Omega}_{T}$ we have $\bar{u}_{[0,T-1]}(\omega)\in S_{T}$ , and the number of $t$ ’s in $[0;T-1]$ with $x_{t}(\omega)=\varphi(t,x_{0}(\omega),\bar{u}(\omega),\bar{w}(\omega))\in B$ is $\geq T-(1+\varepsilon)rT=(1-(1+\varepsilon)r)T$ . Hence,

[TABLE]

Taking logarithms, dividing by $T$ and letting $T\rightarrow\infty$ yields the assertion. The case $Q(B)=1$ is handled by replacing (4) with the inequality

[TABLE]

for an arbitrarily chosen $r\in(0,1)$ , and applying the same arguments.∎

Lemma IV.2, while sounding technical, has significant consequences, since it allows for the application of volume-growth arguments that have been used in the literature for deterministic settings.

V Volume-expanding systems

In this section, we assume throughout that the measure $\pi_{0}$ of the random variable $x_{0}$ is absolutely continuous w.r.t. the Lebesgue measure $m$ on $\mathbb{R}^{N}$ and that the associated density is essentially bounded, i.e., $\pi_{0}\ll_{b}m$ .

Consider a system of the form

[TABLE]

with $U=W=\mathbb{R}^{N}$ and an injective $C^{1}$ -map $f:\mathbb{R}^{N}\rightarrow\mathbb{R}^{N}$ satisfying (with $\mathrm{D}f(x)$ denoting the Jacobian of $f$ at $x$ )

[TABLE]

Theorem V.1

Consider system (5) satisfying (6) and $\pi_{0}\ll_{b}m$ . Assume that the AMS property is achieved with an associated AMS measure $Q$ via a causal coding and control policy over a noiseless channel of capacity $C$ . Then for all Borel sets $B\subset\mathbb{R}^{N}$ with $0<m(B)<\infty$ we have

[TABLE]

Proof:

The proof is subdivided into four steps.

Step 1. Fix a Borel set $B$ with $0<m(B)<\infty$ and let $S\subset U^{T}$ be a finite $(T,B,\rho,r)$ -spanning set (if a finite spanning set does not exist for any $T$ , the estimate becomes trivial). For the associated $\tilde{\Omega}\subset\Omega$ with $P(\tilde{\Omega})\geq 1-\rho$ , define

[TABLE]

for all control and noise sequences $\bar{u}$ and $\bar{w}$ , respectively. Note that the (universal) measurability of $A$ follows from Assumption III.1. From the definition of $(T,B,\rho,r)$ -spanning sets it immediately follows that

[TABLE]

and we have (by Tonelli’s theorem)

[TABLE]

We can write $A(\bar{u},\bar{w})$ as the disjoint union of the sets

[TABLE]

where $\Lambda$ ranges through all subsets of $[0;T-1]$ with cardinality $\geq(1-r)T$ . Then

[TABLE]

Now we prove that

[TABLE]

for a constant $\alpha>0$ , independent of $T$ . First, observe that by the independence of the random variables $x_{0}$ and $\bar{w}=(w_{t})_{t\in\mathbb{Z}_{+}}$ , $\nu^{\mathbb{Z}_{+}}\times\pi_{0}$ is the probability measure of the joint variable $(\bar{w},x_{0})$ . Hence,

[TABLE]

If we write $p$ for the density of $\pi_{0}$ with respect to $m$ and assume that $p(x)\leq\gamma<\infty$ , we thus find that

[TABLE]

implying that (11) holds with the constant $\alpha:=(1-\rho)/\gamma$ .

Step 2. Writing $\varphi_{t,\bar{u},\bar{w}}(\cdot)=\varphi(t,\cdot,\bar{u},\bar{w})$ , we define

[TABLE]

Then we have

[TABLE]

which immediately implies that for $c:=\inf_{x\in B}|\det\mathrm{D}f(x)|$ (using that $f$ is injective and $C^{1}$ )

[TABLE]

Let $t^{*}=t^{*}(\Lambda):=\max\Lambda$ . Then an inductive argument yields

[TABLE]

Step 3. Combining (12), (8), (9), (10) and (11), we obtain

[TABLE]

In $(\diamond)$ we use that the sets $A(\bar{u},\bar{w},\Lambda)$ , $\Lambda\subset[0;T-1]$ , are pairwise disjoint. Because of the assumption that $f$ and hence $\varphi_{t,\bar{u},\bar{w}}$ (for each $t$ ) is injective, this implies that also the sets $A_{t}(\bar{u},\bar{w},\Lambda)$ are pairwise disjoint. Hence, we can conclude that

[TABLE]

Step 4. We complete the proof by applying Lemma IV.2. Let us first assume that $0<Q(B)<1$ . Then Lemma IV.2 together with Step 3 yields

[TABLE]

As $\varepsilon\rightarrow 0$ , the desired inequality follows. The case $Q(B)=0$ is trivial and the case $Q(B)=1$ follows by continuity.∎

Remark V.2

The preceding theorem recovers, as a special case, [51, Thm. 3.2], which shows that $C\geq\inf_{x\in\mathbb{R}^{N}}\log|\det\mathrm{D}f(x)|$ . However, the result there is more general with regard to the allowed class of channels.

Remark V.3

In the inequality (7) we see a trade-off between the $Q$ -measure of the set $B$ and the infimal volume growth on $B$ . If some characteristics of the measure $Q$ are known, one can try to optimize the lower bound by a careful choice of $B$ . Also observe that

[TABLE]

holds for all Borel sets $B$ , where the left-hand side is the expected volume expansion w.r.t. the AMS measure $Q$ . Hence, it is tempting to conjecture that also the integral above is a lower bound on the capacity. Under the stronger criterion of asymptotic ergodicity, such a bound has been derived in [18].

The next corollary shows that imposing further properties on the AMS measure $Q$ can lead to more concrete bounds.

Corollary V.4

Consider system (5) satisfying (6) and $\pi_{0}\ll_{b}m$ . Assume that the AMS property is achieved via a noiseless channel of capacity $C$ and the measure $Q$ satisfies for some $M,p>0$ the moment constraint

[TABLE]

Then the channel capacity satisfies

[TABLE]

Proof:

Consider the set $B:=\overline{B_{\kappa}(0)}$ for a fixed $\kappa>0$ . By Markov’s inequality, the moment constraint implies $Q(B)\geq 1-\frac{M}{\kappa^{p}}$ . Hence, Theorem V.1 implies the assertion.∎

Example V.5

For a linear system with $f(x)=Ax$ , $A\in\mathbb{R}^{N\times N}$ satisfying $|\det A|\geq 1$ , our result implies the well-known relation (cf. [50, 48])

[TABLE]

with summation over all eigenvalues $\lambda$ of $A$ with associated multiplicities $n_{\lambda}$ . By a simple decoupling argument this can be refined to show that $C\geq\sum_{\lambda}\max\{0,n_{\lambda}\log|\lambda|\}$ $\diamond$

The next example shows that for nonlinear systems the supremum in (13) is not necessarily attained as $\kappa\rightarrow\infty$ , i.e., the lower bound (7) indeed expresses a trade-off between the measure of $B$ and the minimal volume expansion on $B$ .

Example V.6

Consider a map $f:\mathbb{R}\rightarrow\mathbb{R}$ with derivative

[TABLE]

and note that $|f^{\prime}(x)|=f^{\prime}(x)>1$ for all $x\in\mathbb{R}$ . Since $f^{\prime}$ is symmetric and monotonically decreasing on $[0,\infty)$ , we obtain

[TABLE]

Corollary V.4, applied with $M=p=1$ thus yields the capacity bound

[TABLE]

A straightforward analysis shows that this supremum is attained as a maximum at $\kappa=3$ , and hence $C\geq 2/(3\sqrt{3})$ . $\diamond$

VI Inhomogeneous semilinear systems

In this section, we also assume throughout that $\pi_{0}\ll_{b}m$ . We consider systems of the form

[TABLE]

where $u_{t}\in U$ and $v_{t}\in V=\mathbb{R}^{M}$ are control variables and $w_{t}\in W=\mathbb{R}^{N}$ is the noise variable. We assume that $U$ is a compact, connected metric space and $A:U\rightarrow\mathrm{Gl}(N,\mathbb{R})$ is continuous. The product space $U^{\mathbb{Z}}$ will be equipped with the product topology (and hence becomes a compact, connected metric space as well). Obviously, the case of linear systems with additive noise is covered here, since $A$ may be chosen to be constant.

The homogeneous system associated with (14) is

[TABLE]

For a given initial state $x_{0}\in\mathbb{R}^{N}$ and a control sequence $\bar{u}=(u_{t})_{t\in\mathbb{Z}}$ we write $\Phi(t,\bar{u})x_{0}$ for the associated solution of (15). Here

[TABLE]

As we will see below, there always exists a finest continuous decomposition of the trivial vector bundle $U^{\mathbb{Z}}\times\mathbb{R}^{N}$ into invariant subbundles:

[TABLE]

Writing $\mathcal{W}^{i}_{\bar{u}}$ , $\bar{u}\in U^{\mathbb{Z}}$ , for the fibers of the subbundles, their invariance can be expressed by the identities

[TABLE]

The subbundles $\mathcal{W}^{i}$ generalize the Lyapunov spaces of a single operator, i.e., the sums of generalized eigenspaces corresponding to eigenvalues of the same modulus.

Before we formulate our main result, we recall some facts about additive cocycles. An additive cocycle over a continuous map $T:X\rightarrow X$ is a function $\alpha:\mathbb{Z}_{+}\times X\rightarrow\mathbb{R}$ , written as $(n,x)\mapsto\alpha_{n}(x)$ , satisfying

[TABLE]

Lemma VI.1

Let $T:X\rightarrow X$ be a continuous map on a compact metric space $X$ . Assume that $\alpha:\mathbb{Z}_{+}\times X\rightarrow\mathbb{R}$ is a continuous additive cocycle over $T$ . Then the following identities hold:

[TABLE]

Moreover, all infima above are attained, and the analogous identities with infima replaced by suprema hold.

A purely topological proof of this lemma can be found in [26, Cor. 2]. For a proof of a more general result using ergodic theory see, e.g., [37, App. A].

Theorem VI.2

Consider system (14). Assume that $\pi_{0}\ll_{b}m$ and that there exists a continuous and invariant vector bundle decomposition

[TABLE]

for the homogeneous system (15). Then, if the AMS property is achieved for (14) via a causal coding and control policy over a noiseless channel of capacity $C$ , we have

[TABLE]

Proof:

First observe that the mapping $(t,\bar{u})\mapsto\log|\det\Phi(t,\bar{u})_{|\mathcal{V}^{1}_{\bar{u}}}|$ is a continuous additive cocycle over the shift $\theta:U^{\mathbb{Z}}\rightarrow U^{\mathbb{Z}}$ . Hence, by Lemma VI.1 the limit

[TABLE]

exists and coincides with the right-hand side in (17). If this limit is $\leq 0$ , the statement becomes trivial, hence we may and will assume that it is positive.

The proof now proceeds along the following four steps.

Step 1. Let us write $P(\bar{u})\in\mathcal{L}(\mathbb{R}^{N},\mathbb{R}^{N})$ for the projection onto $\mathcal{V}^{1}_{\bar{u}}$ along $\mathcal{V}^{2}_{\bar{u}}$ . Observe that by the variation-of-constants formula we can write the solutions of (14) in the form

[TABLE]

We let $k$ denote the rank of the subbundle $\mathcal{V}^{1}$ (i.e., the common dimension of its fibers) and write $m^{k}_{\bar{u}}$ for the $k$ -dimensional Lebesgue measure on $\mathcal{V}^{1}_{\bar{u}}=\mathrm{im}P(\bar{u})$ . Observe that the invariance of $\mathcal{V}^{1}$ and $\mathcal{V}^{2}$ implies

[TABLE]

Moreover, since $\mathcal{V}^{1}$ is a continuous subbundle, the map $\bar{u}\mapsto P(\bar{u})$ is continuous. By compactness of $U^{\mathbb{Z}}$ , the following maximum exists:

[TABLE]

Indeed, this follows from the fact that the Lebesgue measure of the image of a ball under a projection is proportional to the product of its non-vanishing singular values, which depend continuously on the projection.

Step 2. Fix $b>0$ and $\rho,r\in(0,1)$ with $r<\frac{1}{2}$ . Assume that there exists a minimal finite $(T,B,\rho,r)$ -spanning set $S\subset(U\times V)^{T}$ for $B:=\overline{B_{b}(0)}$ (which later will be justified by invoking Lemma IV.2). Then there is $\tilde{\Omega}\subset\Omega$ with $P(\tilde{\Omega})\geq 1-\rho$ so that for each $\omega\in\tilde{\Omega}$ there is $(\bar{u},\bar{v})\in S$ with

[TABLE]

Putting

[TABLE]

for every $(\bar{u},\bar{v})\in S$ , we obtain

[TABLE]

Using the notation $\bar{w}(\omega)=(w_{t}(\omega))_{t\in\mathbb{Z}_{+}}$ , for any $\Lambda\subset[0;T-1]$ we define

[TABLE]

Then, as in the proof of Theorem V.1, we obtain

[TABLE]

We define the probability measure $\mu:=\nu^{\mathbb{Z}_{+}}$ on $W^{\mathbb{Z}_{+}}$ . Then, for any $\Lambda\subset[0;T-1]$ ,

[TABLE]

with $Z(\bar{u},\bar{v},\bar{w},\Lambda):=\{x\in\mathbb{R}^{N}:|\varphi(t,x,(\bar{u},\bar{v}),\bar{w})|\leq b,\ \forall t\in\Lambda\}$ . Fixing $\Lambda\subset[0;T-1]$ and putting $t_{+}=t_{+}(\Lambda):=\max\Lambda$ , an easy computation using (19) and (20) leads to

[TABLE]

which implies (using (21))

[TABLE]

Now

[TABLE]

Putting everything together, we end up with

[TABLE]

To complete the proof, we have to find a reasonable lower bound for the first term above.

Step 3. Fix a subset $\Lambda\subset[0;T-1]$ with $\#\Lambda\geq(1-r)T$ and define $t_{-}=t_{-}(\Lambda):=\min\Lambda$ . Then

[TABLE]

The set on the right-hand side is contained in the closed ball

[TABLE]

As a consequence,

[TABLE]

Let $\langle\cdot,\cdot\rangle_{\bar{u}}$ be an inner product on $\mathbb{R}^{N}$ in which $\mathcal{V}^{1}_{\bar{u}}$ and $\mathcal{V}^{2}_{\bar{u}}$ are orthogonal and write $m_{\bar{u}}$ for the associated Lebesgue measure. Using compactness of $U^{\mathbb{Z}}$ , we can do this in such a way that $m(E)\leq K\cdot m_{\bar{u}}(E)$ with a constant $K>0$ for every Lebesgue measurable set $E\subset\mathbb{R}^{N}$ and every $\bar{u}\in U^{\mathbb{Z}}$ . For any measurable set $A\subset\mathcal{V}^{1}_{\bar{u}}$ , a simple computation yields

[TABLE]

where $m^{N-k}$ denotes the $(N-k)$ -dimensional Lebesgue measure on $\mathcal{V}^{2}_{\bar{u}}$ . Using again the compactness of $U^{\mathbb{Z}}$ , we can find another constant $\tilde{K}>0$ (see Step 1) with

[TABLE]

Putting everything together, we arrive at

[TABLE]

Step 4. We combine the results of steps 2 and 3 to obtain

[TABLE]

Letting $n_{r}(T)$ denote the number of subsets of $[0;T-1]$ with $\#\Lambda\geq(1-r)T$ , using (23), we end up with

[TABLE]

with a positive constant $\gamma$ , where the first inequality follows from $\pi_{0}\ll_{b}m$ , as in the proof of Theorem V.1. Applying the logarithm, dividing by $T$ and letting $T\rightarrow\infty$ yields

[TABLE]

Here we use, in particular, Lemma .1. Observing that $t_{-}(\Lambda)\leq rT$ , we can estimate

[TABLE]

leading to

[TABLE]

Now we use that $t_{+}(\Lambda)\geq(1-r)T$ . Let $\alpha>0$ denote the limit in (18) and let $\varepsilon\in(0,\alpha)$ . Then, for sufficiently large $T$ ,

[TABLE]

Since this holds for all $\Lambda$ with $\#\Lambda\geq(1-r)T$ and $\varepsilon>0$ was arbitrary, we find that

[TABLE]

Observe that this holds for arbitrary $b>0$ , $\rho\in(0,1)$ , $r\in(0,\frac{1}{2})$ and $B=\overline{B_{b}(0)}$ . If $b$ is chosen so that $0<Q(\overline{B_{b}(0)})<1$ , Lemma IV.2 yields

[TABLE]

If $Q(\overline{B_{b}(0)})<1$ for all $b>0$ , we can let $b\rightarrow\infty$ , which implies $r_{b}\rightarrow 0$ and thus

[TABLE]

Otherwise, we have $Q(\overline{B_{b}(0)})=1$ for all sufficiently large $b$ and Lemma IV.2 yields

[TABLE]

also leading to (24). Since $(t,\bar{u})\mapsto\log|\det\Phi(t,\bar{u})_{|\mathcal{V}^{1}_{\bar{u}}}|$ is a continuous additive cocycle over the shift $\theta:U^{\mathbb{Z}}\rightarrow U^{\mathbb{Z}}$ , Lemma VI.1 guarantees that the limit and the infimum in (24) can be interchanged (replacing $\lim$ with $\limsup$ or $\liminf$ ), which completes the proof.∎

Remark VI.3

The proof of the above theorem is partly modeled according to [24, Thm. 3.3]. For a more detailed explanation of the arguments used in Step 3, see [24, Lem. 3.3].

Example VI.4

Consider the special case of a linear system, i.e., $A(u)\equiv A\in\mathbb{R}^{N\times N}$ . Then the vector bundle decomposition (16) can be chosen as

[TABLE]

where $E^{u}(A)$ and $E^{cs}(A)$ are the unstable and center-stable subspace of $A$ , respectively. This immediately implies

[TABLE]

with summation over the eigenvalues $\lambda$ of $A$ with algebraic multiplicities $n_{\lambda}$ . $\diamond$

In the following, we will show that there always exists a finest continuous decomposition of $U^{\mathbb{Z}}\times\mathbb{R}^{N}$ into invariant subbundles

[TABLE]

which is related to the dynamical behavior of the system induced by (15) on the projective bundle $U^{\mathbb{Z}}\times\mathbb{P}^{N-1}$ . This follows from a general result about linear flows on vector bundles known as Selgrade’s theorem, which reads as follows.

Proposition VI.5

Let $V\rightarrow B$ be a finite-dimensional real vector bundle with compact metric base space $B$ . Assume that $\phi_{t}:V\rightarrow V$ , $t\in\mathbb{Z}$ , is a continuous discrete-time linear flow on $V$ and that the induced flow on $B$ is chain transitive. Then there exists a unique finest Morse decomposition $\mathcal{M}_{1},\ldots,\mathcal{M}_{r}$ of the induced flow on the projective bundle $\mathbb{P}V\rightarrow B$ , and $1\leq r\leq d=\dim V_{b}$ , $b\in B$ . Every Morse set $\mathcal{M}_{i}$ defines a $\phi_{t}$ -invariant subbundle of $V$ via

[TABLE]

and the following decomposition into a Whitney sum holds:

[TABLE]

For an introduction to the concepts of chain transitivity and Morse decompositions used in this proposition we refer to [10, 41]. A continuous-time version of the proposition can also be found in [10]. The discrete-time version follows from a more general result, see [41, Thm. 6.2 and Thm. 7.5].

The next proposition shows that Selgrade’s theorem can be applied to the linear flow generated by equation (15) on the trivial vector bundle $U^{\mathbb{Z}}\times\mathbb{R}^{N}$ .

Proposition VI.6

The solutions of the homogeneous equation (15) define a continuous discrete-time linear flow on the trivial vector bundle $V:=U^{\mathbb{Z}}\times\mathbb{R}^{N}$ with compact metric base space $U^{\mathbb{Z}}$ . This flow is given by $\phi_{t}(\bar{u},x)=(\theta^{t}\bar{u},\Phi(t,\bar{u})x)$ , $t\in\mathbb{Z}$ . Moreover, the shift map $\theta:U^{\mathbb{Z}}\rightarrow U^{\mathbb{Z}}$ is chain transitive.

Proof:

We know that $U^{\mathbb{Z}}$ , equipped with the product topology, is a compact and connected metric space. The flow properties ( $\phi_{0}(\bar{u},x)=(\bar{u},x)$ and $\phi_{t+s}(\bar{u},x)=\phi_{t}(\phi_{s}(\bar{u},x))$ ) are easy to see. Continuity and (fiber-wise) linearity of $\phi$ are clear. From the fact that the periodic points of $\theta$ (which are precisely the periodic sequences) are dense in $U^{\mathbb{Z}}$ , it follows that every point in $U^{\mathbb{Z}}$ is chain recurrent. It is well-known that a homeomorphism is chain transitive on any closed set which is connected and consists of chain recurrent points.∎

Combining Selgrade’s theorem with Theorem VI.2, we obtain the following corollary.

Corollary VI.7

Consider system (14) and the Selgrade decomposition (25) associated with the homogeneous system (15). Assume that the subbundles are ordered such that

[TABLE]

for $i=1,\ldots,s$ , where $s\in\{0,1,\ldots,r\}$ is the maximal number with this property. Then, if $\pi_{0}\ll_{b}m$ and the AMS property is achieved over a noiseless channel of capacity $C$ ,

[TABLE]

where the right-hand side is defined as zero if $s=0$ .

Proof:

Define $\mathcal{V}^{1}:=\mathcal{W}^{1}\oplus\cdots\oplus\mathcal{W}^{s}$ , $\mathcal{V}^{2}:=\mathcal{W}^{s+1}\oplus\cdots\oplus\mathcal{W}^{r}$ . Then $U^{\mathbb{Z}}\times\mathbb{R}^{N}=\mathcal{V}^{1}\oplus\mathcal{V}^{2}$ . Since $|\det\Phi(t,\bar{u})_{|\mathcal{V}^{1}_{\bar{u}}}|$ is, up to some multiplicative constant, the product of the numbers $|\det\Phi(t,\bar{u})_{|\mathcal{W}^{i}_{\bar{u}}}|$ , $i=1,\ldots,s$ , it follows that

[TABLE]

where we use Lemma VI.1 twice. This implies the result.∎

Example VI.8

In the special case when $r=1$ (only one Selgrade bundle) and the system is asymptotically volume-expanding, i.e.,

[TABLE]

the lower bound of Corollary VI.7 reduces to $C\geq\min_{u\in U}\log|\det A(u)|$ . Indeed, it is easy to see that the infimum over $\bar{u}\in U^{\mathbb{Z}}$ is then attained at the constant sequence with value $u_{*}=\mathrm{argmin}|\det A(u)|$ . $\diamond$

For the general case, one can use numerical methods to approximate the Lyapunov exponents, and hence, the associated volume growth rates, for the homogeneous semilinear system (15). For continuous-time bilinear control systems, methods for the computation of Lyapunov exponents based on algorithms for solving discounted optimal control problems have been developed in [20] (see also [10, App. D]). In general, these methods also work for discrete-time systems.

VII The noisy channel case

For discrete noiseless channels, the key idea combining the volume-growth based approaches for deterministic models with the stochastic system setup was the observation that the number of control sequences is bounded from above by the total number of received messages. This approach clearly does not directly apply to a noisy channel setup, for there can be an arbitrarily large number of possibly distinct received channel outputs, but these may not carry reliable information. In the following, we develop a new method to address this for a discrete memoryless channel (DMC). For a review of channel capacity with feedback see [13], [48, Sec. 5.3.4].

Figure 1 shows the control loop, using a DMC with feedback for data transmission from the encoder to the controller. The channel has a finite input alphabet $\mathcal{M}$ and a finite output alphabet $\mathcal{M}^{\prime}$ . The channel input $q_{t}$ at time $t$ is generated by a function $\gamma^{e}_{t}$ so that $q_{t}=\gamma^{e}_{t}(x_{[0,t]},q^{\prime}_{[0,t-1]})$ . The channel maps $q_{t}$ to $q^{\prime}_{t}$ in a stochastic fashion so that $P(q^{\prime}_{t}\in\cdot|q_{t},q_{[0,t-1]},q^{\prime}_{[0,t-1]})=P(q^{\prime}_{t}\in\cdot|q_{t})$ is a conditional probability measure on $\mathcal{M}^{\prime}$ for all $t\in\mathbb{Z}_{+}$ , for every realization $q_{t},q_{[0,t-1]},q^{\prime}_{[0,t-1]}$ . The controller, upon receiving the information from the channel, generates its decision at time $t$ , also causally: $u_{t}=\gamma_{t}^{c}(q^{\prime}_{[0,t]})$ .

Consider a DMC with channel capacity $C$ (we note that for DMCs, it is a well-known result that feedback cannot increase the capacity). Then the following property, known as the strong converse, holds, see [28], [13, Problem 10.17]: For any $R>C$ , under any coding policy:

[TABLE]

where $p_{e}(T)$ is the average probability of error among $2^{RT}$ equally likely messages after the channel is used $T$ times under coding and decoding policies admissible according to the standard information-theoretic formulation of communication with noiseless feedback, cf. [45].

Now we consider a scalar system of the form

[TABLE]

with a $C^{1}$ -function $f:\mathbb{R}\rightarrow\mathbb{R}$ satisfying

[TABLE]

Our main result reads as follows.

Theorem VII.1

Consider system (27) satisfying (28). Assume that $\pi_{0}\ll m$ with $p$ denoting the density with respect to $m$ , that $K:=\mathrm{supp}(\pi_{0})$ is a compact interval and

[TABLE]

Then, if the AMS property is achieved via a causal coding and control strategy over a DMC of capacity $C$ , we have

[TABLE]

Before the proof, it may be instructive to explain the proof approach which builds on the construction of an auxiliary coding problem that relates the number -per time stage- of distinct control actions (in a similar spirit that was the basis of the definition of stabilization entropy) to an information transmission problem and in turn to an analysis on channel capacity with feedback; by considering the fact that the number of informative messages per time stage to be transmitted with regard to the initial state cannot be less than the desired bound. The coding problem is related to a channel coding theorem via optimal transport inequalities.

Proof:

Throughout the proof, we use the following notation: Observing that we have three sources of stochasticity – the initial state $x_{0}$ , the noise sequence $(w_{t})$ and the channel noise – every time we make a statement about the probability $P(E)$ of an event $E$ , we will add subscripts to the letter $P$ , indicating which probability measures are involved in computing this probability: The subscript “ $\mathrm{i}$ ” is used for the initial state, subscript “ $\mathrm{n}$ ” for the noise and subscript “ $\mathrm{c}$ ” for the channel.

Let $c:=\inf_{x\in\mathbb{R}}|f^{\prime}(x)|$ . Without loss of generality, we can assume that $c>1$ . We prove the theorem by contradiction, assuming that $C<\log c$ . First, we fix a sufficiently small $r^{*}>0$ so that

[TABLE]

Since the AMS measure $Q$ is a probability measure, we can choose for every sufficiently small $\alpha\in(0,r^{*})$ a $b>0$ with

[TABLE]

Later we will consider an auxiliary coding scheme, where the initial state $x_{0}$ is to be estimated at each time stage $T\in\mathbb{Z}_{+}$ through the knowledge of the control sequence $\bar{u}\in U^{T+1}$ , applied by the controller in $[0;T]$ . Given a noise realization $\bar{w}$ (that we will fix later), as an estimate for $x_{0}$ at time $T$ we use the center $\hat{x}_{0}(T,\bar{u},\bar{w})$ of the compact set

[TABLE]

i.e., the midpoint of $[\min A_{T}(\bar{u},\bar{w}),\max A_{T}(\bar{u},\bar{w})]$ . To derive an estimate for the diameter of $A_{T}(\bar{u},\bar{w})$ , let $x_{1},x_{2}\in A_{T}(\bar{u},\bar{w})$ be chosen arbitrarily. We claim that there exists a time $t_{*}$ with $\lceil(1-3r^{*})T\rceil\leq t_{*}\leq T-1$ such that

[TABLE]

Indeed, if this was not the case, then the number of $t$ ’s in the interval $[\lceil(1-3r^{*})T\rceil;T-1]$ with $\varphi(t,x_{i},\bar{u},\bar{w})\in[-b,b]$ for each $i=1,2$ can be at most half of the cardinality of this interval, implying that the total number of $t$ ’s in $[0;T-1]$ such that $\varphi(t,x_{i},\bar{u},\bar{w})\in[-b,b]$ is bounded by

[TABLE]

for $T$ large enough, a contradiction. We thus obtain

[TABLE]

implying

[TABLE]

Now the AMS property together with (30) implies

[TABLE]

Indeed, this follows by an application of Markov’s inequality:

[TABLE]

From (31) and the definition of $A_{T}(\bar{u},\bar{w})$ we conclude that

[TABLE]

and the left-hand side is smaller than $\alpha/r^{*}$ for large $T$ by (32). Our aim is to show that

[TABLE]

leading to a contradiction with (32).

To this end, we will distinguish between two complementary cases. To classify these cases, we introduce the notion of a control rate $R$ as follows.

For each $T\geq 1$ , let $\mathcal{U}_{T}$ be the set of all possible control sequences in $U^{T}$ the controller can generate under the given coding and control policy, i.e.,

[TABLE]

We define the control rate by

[TABLE]

We now treat the two possible cases $R<(1-3r^{*})\log c$ and $R\geq(1-3r^{*})\log c$ separately.

Case 1: We fix a noise realization $\bar{w}_{*}$ and prove (33) for the conditional probability of the corresponding event given $\bar{w}=\bar{w}_{*}$ . To simplify notation, we write $A_{T}(\bar{u})$ and $\hat{x}_{0}(T,\bar{u})$ instead of $A_{T}(\bar{u},\bar{w}_{*})$ and $\hat{x}_{0}(T,\bar{u},\bar{w}_{*})$ , respectively.

Assume that $R<(1-3r^{*})\log c$ and pick $\varepsilon>0$ so that $R+2\varepsilon<(1-3r^{*})\log c$ . Put $\tilde{A}_{T}(\bar{u}):=[\min A_{T}(\bar{u}),\max A_{T}(\bar{u})]$ and note that $\#\mathcal{U}_{T}\leq 2^{(R+\varepsilon)T}$ for all sufficiently large $T$ . From (31) it follows that

[TABLE]

Since $\pi_{0}\ll m$ , it follows that $\pi_{0}(\bigcup_{\bar{u}\in\mathcal{U}_{T}}\tilde{A}_{T}(\bar{u}))\rightarrow 0$ as well and thus

[TABLE]

The inequality above holds, since $|x_{0}-\hat{x}_{0}(T,\bar{u})|\leq\frac{b}{c^{(1-3r^{*})T}}$ implies the existence of some $\bar{v}\in U^{T}$ with $x_{0}\in\tilde{A}_{T}(\bar{v})$ . Thus, (33) holds, since

[TABLE]

independently of the noise realization $\bar{w}_{*}$ .

Case 2: Assume that the control rate satisfies $R\geq(1-3r^{*})\log c$ and

[TABLE]

contrary to (33). Fix a noise realization $\bar{w}_{*}$ so that

[TABLE]

and drop the realization $\bar{w}_{*}$ in the notation, as in Case 1. Furthermore, write $P_{\mathrm{i},\mathrm{c}}(|x_{0}-\hat{x}_{0}(T,\bar{u})|>\frac{b}{c^{(1-3r^{*})T}})$ for the conditional probability above.

The rest of Case 2 is subdivided into five steps.

Step 1 (Construction of sets of bins): For every $T\geq 1$ , we define $\mathcal{S}_{T}:=\{\hat{x}_{0}(T,\bar{u}):\bar{u}\in\mathcal{U}_{T}\}$ and enumerate the elements of $\mathcal{S}_{T}$ so that

[TABLE]

where

[TABLE]

We define the following collection of bins:

[TABLE]

for $i=1,\ldots,n_{1}(T)$ , which are not necessarily disjoint. Each ${\bf B}^{T}_{i}$ has the same Lebesgue measure, which we denote by $\rho_{T}:=(2b)/(c^{(1-3r^{*})T})$ . From (34) it follows that

[TABLE]

for which it must be, by the analysis in Case 1, that

[TABLE]

We want to concentrate on the bins that are completely contained in $K=\mathrm{supp}(\pi_{0})$ . Since we assume that $K$ is an interval, the bins that are only partially contained in $K$ can contribute only very little measure as $T$ becomes large (their union can have at most twice the Lebesgue measure of a single bin), hence we can ignore them. Now assume that the number of bins that are completely outside of $K$ is $n(T)$ , and for simplicity assume that these bins are always the last $n(T)$ bins in the enumeration ${\bf B}^{T}_{1},\ldots,{\bf B}^{T}_{n_{1}(T)}$ . For large $T$ , this implies

[TABLE]

Hence, $n_{1}(T)-n(T)$ must grow at an exponential rate of at least $(1-3r^{*})\log c$ , just as $n_{1}(T)$ . We will thus, in the rest of the proof, assume w.l.o.g. that all bins ${\bf B}^{T}_{i}$ are completely contained in $K$ .

Now, from $\{{\bf B}^{T}_{i}\}$ we extract a subcollection of disjoint bins $\{{\bf C}_{i}^{T}\}_{i=1}^{n_{2}(T)}$ via the construction in Lemma .2 (see Figure 2 for an example representation). In particular, we assume that the bins ${\bf B}_{i}^{T}$ are ordered according to the natural (non-decreasing) order of their left endpoints. This implies

[TABLE]

Furthermore, it must be that

[TABLE]

for otherwise, by the analysis in Case 1, $m(\bigcup_{i=1}^{n_{2}(T)}{\bf C}^{T}_{i})\rightarrow 0$ in contradiction to (37) and (38). Now, using the definition (51) of the leftover set, we define a collection of $n_{2}(T)$ sets

[TABLE]

Hence, ${\bf D}^{T}_{k}\subset[\alpha_{k},\alpha_{k+1})$ , where $\alpha_{k}=\min{\bf C}^{T}_{k}$ . The sets ${\bf D}^{T}_{k}$ are thus pairwise disjoint. Also observe that

[TABLE]

since the leftover set has at most the Lebesgue measure of one bin. Finally, for a fixed $L\in\mathbb{N}$ , group each collection of $L$ successive ${\bf D}^{T}_{k}$ bins as

[TABLE]

(In the definition of the last bin ${\bf E}^{T}_{n_{3}(T)}$ , we add some empty sets to the collection $\{{\bf D}^{T}_{k}\}$ ). From (39) it follows that the number of these bins also satisfies

[TABLE]

Also observe that

[TABLE]

Let

[TABLE]

and observe that

[TABLE]

Step 2 (The auxiliary coding scheme): We now construct an auxiliary coding scheme (in a traditional information-theoretic sense) as follows: We use the received channel output/control sequence to reconstruct the index $\Upsilon$ of the bin ${\bf E}^{T}_{\Upsilon}$ containing $x_{0}$ by looking at the points $\hat{x}_{0}(T,\bar{u})$ . With $\hat{\Upsilon}$ denoting the estimate of $\Upsilon$ at the decoder, in the following we study $P(\hat{\Upsilon}\neq\Upsilon)$ . By construction of the bins, if

[TABLE]

there is no ambiguity, hence $\Upsilon$ can be reconstructed and $\hat{\Upsilon}=\Upsilon$ (no error).

On the other hand, if $x_{0}\in M_{T}\setminus\overline{M}_{T}$ , we have the following analysis: For every $x_{0}\in M_{T}\setminus\overline{M}_{T}$ , there is $k\geq 1$ so that $x_{0}\in{\bf D}^{T}_{kL}\setminus{\bf C}^{T}_{kL}$ and hence, given the event $|x_{0}-\hat{x}_{0}(T,\bar{u})|\leq\frac{b}{c^{(1-3r^{*})T}}$ , $x_{0}\in{\bf D}^{T}_{kL}\setminus{\bf C}^{T}_{kL}$ , the correct bin could be either ${\bf E}^{T}_{k}$ or ${\bf E}^{T}_{k+1}$ . So, we can randomly and independently assign the channel output/control to either $\Upsilon=k$ or $\Upsilon=k+1$ . The associated error probability is at most $1/2$ when the events $|x_{0}-\hat{x}_{0}(T,\bar{u})|\leq\frac{b}{c^{(1-3r^{*})T}}$ and $x_{0}\in{\bf D}^{T}_{kL}\setminus{\bf C}^{T}_{kL}$ hold, i.e.,

[TABLE]

Altogether, the error probability in our coding scheme can be estimated as follows:

[TABLE]

From (34) it follows that for all large enough $T$ :

[TABLE]

Combining (40) and (42), we obtain

[TABLE]

Since $\overline{M}_{T}\subset M_{T}$ , the union in the definition of $\overline{M}_{T}$ is a disjoint union and the union of all ${\bf E}^{T}_{i}$ equals $M_{T}$ , we have

[TABLE]

Together with (44), we thus obtain

[TABLE]

Step 3 (Introduction of an auxiliary source variable with uniform distribution): From (43) it follows that

[TABLE]

Since clearly $\pi_{0}({\bf E}^{T}_{i})\geq p_{\min}\frac{L\rho_{T}}{\pi_{0}(M_{T})}\pi_{0}(M_{T})$ , we obtain

[TABLE]

Combining this with (46) leads to

[TABLE]

Let $W$ be an auxiliary random variable on $\{1,\ldots,n_{3}(T)\}$ with uniform distribution. Then we have

[TABLE]

Considering the complementary events, we obtain

[TABLE]

Combining this with (47) leads to

[TABLE]

Step 4 (Application of optimal transport theory and coupling of the uniform source with the distribution of $\{{\bf E}^{T}_{i}\}$ ): The information-theoretic formulation of information transmission assumes that the messages to be transmitted are uniformly distributed. In the final step of our analysis, we relate the messages represented by the indices of the ${\bf E}^{T}_{i}$ ’s with their induced distribution under $\pi_{0}$ to a uniformly distributed set of messages: Let $P$ be the distribution of the indices of the ${\bf E}^{T}_{i}$ ’s under $\pi_{0}$ and $P^{\prime}$ the uniform distribution of $W$ , with the same cardinality as the set of ${\bf E}^{T}_{i}$ ’s. There exists a coupling between $P$ and $P^{\prime}$ so that the expected error is lower bounded by the total variation distance between $P$ and $P^{\prime}$ ; by finding a coupling (cf. [47, Eq. (6.11)]), we can achieve that

[TABLE]

Let us estimate $\beta$ . For sufficiently large $T$ , we have

[TABLE]

Since $P^{\prime}(i)=\frac{1}{n_{3}(T)}$ , this implies

[TABLE]

Step 5 (Application of the strong converse): In view of all of the above steps, the proposed coding scheme can be used to encode an auxiliary equi-distributed random variable with an asymptotic average probability of error upper bounded by

[TABLE]

This error bound can be made strictly smaller than $1$ , when $L$ is chosen sufficiently large and $\alpha$ sufficiently small. Thus, we arrive at a contradiction with the strong converse (26), because the rate of our coding scheme satisfies

[TABLE]

The proof is complete.∎

We note the following variation where the initial measure may have non-compact support with a proof sketch.

Theorem VII.2

Consider system (27) satisfying (28). Assume that $\pi_{0}\ll m$ with $p$ denoting the density with respect to $m$ , and that for every $\epsilon>0$ , there exists a compact interval $K_{\epsilon}$ such that, $\pi_{0}(K_{\epsilon})\geq 1-\epsilon$ and with

[TABLE]

the following assumption holds:

[TABLE]

Then, if the AMS property is achieved via a causal coding and control strategy over a DMC of capacity $C$ , we have

[TABLE]

Remark VII.3

A sufficient condition for (49) is that $p$ is differentiable, positive everywhere and monotone decreasing in either direction as $|x|$ increases for sufficiently large values of $|x|$ , and $\lim_{|x|\to\infty}p^{\prime}(x)/p(x)=\infty$ . This follows from an application of L’Hospital’s theorem to the expression

[TABLE]

Probability densities which decay faster than an exponential (such as the Gaussian) satisfy this condition. An exponential density (if one-sided, the denominator will just be $p(x)$ ) keeps this ratio a constant as $|x|$ increases and densities with a heavier tail than an exponential do not satisfy this condition.

Proof:

The proof follows almost identically as that of Theorem VII.1: Case 1 follows identically. For Case 2, in the following, fix a sufficiently small $\epsilon$ and a corresponding $K_{\epsilon}$ . If (33) does not hold, then we can instead of (34), consider

[TABLE]

We will construct the auxiliary coding scheme by embedding the bins inside $K_{\epsilon}$ . We will thus focus on the sub-probability measure defined by the restriction of $\pi_{0}$ to $K_{\epsilon}$ , defined formally as $\pi^{K_{\epsilon}}_{0}(B):=\pi_{0}(B\cap K_{\epsilon})$ for every Borel $B$ , and thus we replace (37) with

[TABLE]

The analysis will go through all the way until in Step 5, where the following term needs to be made less than 1:

[TABLE]

The only additional term, when compared with Step 5 of the proof of Theorem VII.1, is the expression $2\epsilon p^{K_{\epsilon}}_{\max}/(p^{K_{\epsilon}}_{\min}(1-\frac{\alpha}{r^{*}}-\epsilon))$ . Since $p^{K_{\epsilon}}_{\max}$ is uniformly bounded under the given assumptions, condition (49) ensures that this term can be made arbitrarily small as $\epsilon$ is made small.∎

VIII Discussion and concluding remarks

In this paper, we considered a stochastic stabilization problem for a general controlled stochastic system over a communication channel. For this problem, we developed a new approach derive fundamental lower bounds on information transmission requirements for control over communication channels. These lower bounds are consistent with the bounds obtained earlier via information-theoretic methods and those obtained for more restrictive models (including linear systems). Moreover, the new proofs are more direct and concise and they allow to obtain finer lower bounds for a large class of systems. The lower bounds obtained for the AMS property are expressed in terms of the determinant of the Jacobian of the nonlinear system model and these recover the existing results for the linear system setup as a special case. For noisy channels, our approach has been to develop a method to relate stabilization entropy and channel capacity through a generalization of the strong converse of information theory.

Achievability results have been obtained for linear systems in [50, 48] and for nonlinear systems in [51]. In particular, [50, Thm. 4.2] shows that for a linear system with a diagonalizable matrix $A$ , controlled over a DMC, the AMS property can be achieved whenever the channel capacity exceeds the log-sum of the unstable eigenvalues. Hence, in this case the lower bounds following from the results in this paper match with the upper bound. For nonlinear systems of the form

[TABLE]

with $f(\cdot,u):\mathbb{R}^{N}\rightarrow\mathbb{R}^{N}$ invertible and $C^{1}$ for every $u$ and $\{w_{t}\}$ an i.i.d. sequence of zero-mean Gaussian variables, it is shown in [51, Thm. 5.1] that ergodicity (and thus AMS) can be achieved over a over a discrete noiseless channel under the following assumption: There exist a function $\kappa:\mathbb{R}^{N}\rightarrow\mathbb{R}^{M}$ with $\kappa(0)=0$ and a constant $a>0$ such that $|f(x,\kappa(z))|_{\infty}\leq a|x-z|_{\infty}$ for all $x,z\in\mathbb{R}^{N}$ . In this case, the minimal required channel capacity $C_{0}$ satisfies $C_{0}\leq N\log(a)+1$ .

Finally, we want to mention that local exponential orbit complexity of the open-loop system (as opposed to the global unstable behavior imposed in the system models studied in Section V), in general, does not lead to a positive bound on the channel capacity. For instance, if a system of the form

[TABLE]

admits a compact uniformly hyperbolic set for the associated deterministic system $x_{t+1}=f(x_{t})$ and the noise amplitude is sufficiently small, it is well-known that the uncontrolled noisy system $x_{t+1}=f(x_{t})+w_{t}$ admits a random hyperbolic set supporting a stationary measure under mild assumptions, cf. [31] (see also the relevant classical theory of positive Harris recurrence [35, 49]). Hence, for an appropriate initial measure $\pi_{0}$ , the uncontrolled system is already AMS, implying that no information transmission at all is necessary.

Lemma .1

Let $\alpha,\beta,r\in(0,1)$ with $\alpha+\beta=1$ . Then

[TABLE]

As a consequence,

[TABLE]

Proof:

Let $(X_{t})_{t\geq 0}$ be an i.i.d. sequence of $\{0,1\}$ -valued Bernoulli random variables with associated probability distribution $\bar{Q}(X_{t}=0)=\beta$ , $\bar{Q}(X_{t}=1)=\alpha$ . Then

[TABLE]

Sanov’s theorem (see [12, Thm. 11.4.1]) yields

[TABLE]

where $\bar{P}^{*}$ is the information projection of $\bar{Q}$ onto $E:=\{P:P(1)\geq 1-r\}$ , i.e., the distribution that minimizes

[TABLE]

under the constraint $P(1)\geq 1-r$ . To determine the solution to this minimization problem, we define the function

[TABLE]

whose derivative $h^{\prime}(t)=\log(\frac{\alpha}{\beta}\frac{t}{1-t})$ vanishes if and only if $t=\beta$ . Computing the second derivative $h^{\prime\prime}(t)=(\ln(2)t(1-t))^{-1}$ , we see that $h^{\prime\prime}(\beta)>0$ , hence $h$ has a minimum at $t=\beta$ . Due to the constraint $P(1)\geq 1-r$ , this is only relevant if $\beta\leq r$ . In this case, the minimizing distribution is $(\bar{P}^{*}(0),\bar{P}^{*}(1))=(\beta,\alpha)$ . Otherwise, the minimum is attained at $t=r$ (by monotonicity) and $(\bar{P}^{*}(0),\bar{P}^{*}(1))=(r,1-r)$ . This implies the first assertion of the lemma. The identity (50) follows by considering $\alpha=\beta=\frac{1}{2}$ .∎

Lemma .2

Let $\{I_{1},\ldots,I_{r}\}$ be a finite collection of compact intervals, each of equal length $|I_{i}|=l$ . Then there exists a pairwise disjoint subcollection $\{I_{i_{1}},\ldots,I_{i_{k}}\}$ satisfying

[TABLE]

Proof:

We may assume that the intervals $I_{i}$ are ordered so that their left endpoints form a non-decreasing sequence. Then the indexes $i_{j}$ are determined as follows: Put $i_{1}:=1$ . Then take the next interval in $\{I_{i}\}_{i=2}^{r}$ , which does not intersect $I_{i_{1}}$ and call it $I_{i_{2}}$ . Let $I_{j}=[\alpha_{j},\beta_{j}]$ . The leftover space $L(i_{1},i_{2})$ between $I_{i_{1}}=I_{1}$ and $I_{i_{2}}$ is

[TABLE]

and has Lebesgue measure $\leq l$ , for otherwise $I_{i_{2}}$ would not be the first interval not intersecting $I_{i_{1}}$ . Continuing in this way, we find the desired collection of pairwise disjoint intervals and it follows that $m(\bigcup_{j=1}^{k}I_{i_{j}})=kl$ , while

[TABLE]

implying $2m(\bigcup_{j=1}^{k}I_{i_{j}})=2kl\geq m\Bigl{(}\bigcup_{i=1}^{r}I_{i}\Bigr{)}$ . ∎

Bibliography52

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. L. Adler, A. G. Konheim, M. H. Mc Andrew. Topological entropy . Trans. Amer. Math. Soc. 114 (1965), 309–319.
2[2] B. R. Andrievskii, A. S. Matveev, A. L. Fradkov. Control and estimation under information constraints: toward a unified theory of control, computation, and communications . (Russian) Avtomat. i Telemekh. 2010, no. 4, 34–99; translation in Autom. Remote Control 71 (2010), no. 4, 572–633.
3[3] J. Baillieul. Feedback designs for controlling device arrays with communication channel bandwidth constraints . In 4th ARO Workshop on Smart Structures, State College, PA , August 1999.
4[4] J. Baillieul. Data-rate requirements for nonlinear feedback control . In Proc. 6th IFAC Symp. Nonlinear Control Syst., Stuttgart, Germany, 2004, 1277–1282.
5[5] A. R. Barron. The strong ergodic theorem for densities: generalized Shannon-Mc Millan-Breiman theorem . Ann. Probab. 13 (1985), no. 4, 1292–1303.
6[6] W. S. Wong, R. W. Brockett. Systems with finite communication bandwidth constraints. II. Stabilization with limited information feedback . IEEE Trans. Automat. Control 44 (1999), no. 5, 1049–1053.
7[7] F. Colonius. Minimal bit rates and entropy for stabilization . SIAM J. Control Optim. 50 (2012), 2988–3010.
8[8] F. Colonius. Metric invariance entropy and conditionally invariant measures . Ergodic Theory Dynam. Systems 38 (2018), no. 3, 921–939.