Invariance Properties of Controlled Stochastic Nonlinear Systems under Information Constraints
Christoph Kawan, Serdar Y\"uksel

TL;DR
This paper investigates the limits of stabilizing stochastic nonlinear systems over communication channels by developing new entropy-based bounds using ergodic theory, enhancing understanding of information requirements for stability.
Contribution
It introduces a novel ergodic-theoretic approach and a new entropy concept to derive refined bounds on information transmission needed for system stabilization.
Findings
Derived fundamental bounds on communication requirements for stability.
Developed a new entropy measure tailored for AMS analysis.
Provided more versatile and refined bounds compared to previous methods.
Abstract
Given a stochastic nonlinear system controlled over a possibly noisy communication channel, the paper studies the largest class of channels for which there exist coding and control policies so that the closed-loop system is stochastically stable. The stability criterion considered is asymptotic mean stationarity (AMS). We develop a general method based on ergodic theory and probability to derive fundamental bounds on information transmission requirements leading to stabilization. Through this method we develop a new notion of entropy which is tailored to derive lower bounds for asymptotic mean stationarity for both noise-free and noisy channels. The bounds obtained through probabilistic and ergodic-theoretic analysis are more refined in comparison with the bounds obtained earlier via information-theoretic methods. Moreover, our approach is more versatile in view of the models considered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Invariance Properties of Controlled Stochastic Nonlinear Systems under Information Constraints111This research was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada. A brief presentation of some of the results in this paper will appear at the 2019 Information Theory Workshop, Visby, Sweden.
Christoph Kawan and Serdar Yüksel C. Kawan is with the Institute for Informatics at the Ludwig-Maximilians-Universität Munich, 80538 Munich, Germany (email: [email protected]). S. Yüksel is with the Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6 (e-mail: [email protected]). This research was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada. Some results of this paper were presented without proofs at the 2019 IEEE Information Theory Workshop.
Abstract
Given a stochastic nonlinear system controlled over a possibly noisy communication channel, the paper studies the largest class of channels for which there exist coding and control policies so that the closed-loop system is stochastically stable. The stability criterion considered is asymptotic mean stationarity (AMS). We develop a general method based on ergodic theory and probability to derive fundamental bounds on information transmission requirements leading to stabilization. Through this method we develop a new notion of entropy which is tailored to derive lower bounds for asymptotic mean stationarity for both noise-free and noisy channels. The bounds obtained through probabilistic and ergodic-theoretic analysis are more refined in comparison with the bounds obtained earlier via information-theoretic methods. Moreover, our approach is more versatile in view of the models considered and allows for finer lower bounds when the AMS measure is known to admit further properties such as moment bounds.
Index Terms:
Stochastic stabilization; asymptotic mean stationarity; measure-theoretic entropy; information theory
I Introduction
Consider the following problem: Given a stochastic nonlinear system controlled over a communication channel, what is the largest class of such channels so that there exist coding and control policies leading to (some form of) stochastic stability? Various versions of this problem have been studied extensively for (possibly stochastic) linear systems and deterministic nonlinear systems.
For deterministic nonlinear systems, invariance entropy [9] measures the smallest average data rate of a noiseless channel above which a compact subset of the state space can be made invariant by a controller receiving its state information through this channel. The essence of the idea behind this concept is as follows: If the controller has bits of information available, it can distinguish at most different states, hence generate at most different control inputs. Consequently, the number of control inputs needed to achieve the control objective (on a finite time interval) is a measure for the necessary information. The definition of invariance entropy thus reads
[TABLE]
where is the minimal number of control inputs needed to achieve invariance of on the time interval for arbitrary initial states in . It is relatively immediate to observe that the growth rate of is directly related to the rate of volume expansion for subsets of under the evolution of the system. Indeed, the faster volume is expanded, the more coding regions, and hence different control inputs, are necessary to keep the whole volume inside . Since for every reasonable stabilization objective it is necessary to keep certain volumes bounded (or even shrink them to zero), the same ideas as used in the definition of invariance entropy should work universally for stabilization over discrete channels. This intuition was rigorously verified in a number of publications, including [7, 9, 11, 14, 24, 25].
In this paper, we demonstrate that such an approach is also applicable, by means of the machinery we develop, to stochastic systems, stochastic channels, and to stochastic stability. Our criterion for stochastic stability is asymptotic mean stationarity (AMS), introduced by Gray & Kieffer [19] and used in networked control in a number of publications [48, 50, 51]. This concept considerably weakens the notion of stationarity and is closely related to other criteria used in the literature, such as stability in probability [33], (unique) ergodicity [48], as well as another commonly used stability criterion: finite -th moment stability for various [36, 38, 44]. The AMS property is weaker than unique ergodicity, and the finite-moment stability typically implies the AMS property provided additional regularity properties are imposed. Nonetheless, the AMS property is a very versatile notion; if one assumes that the support of the asymptotic mean measure is compact, the AMS property can be related to set stability; if one assumes that this measure has a finite -th moment for its coordinate state process, the AMS property would lead to the finite-moment stability property, and finally the ergodicity can also be imposed for certain applications through mixing properties, e.g., through the construction of a positive Harris recurrent Markov chain [49]. Barron [5] and Gray & Kieffer [19] note various other operational utilities of the AMS property.
As an auxiliary quantity to derive lower bounds on the necessary channel capacity for generating an AMS state process, we introduce a new concept of stabilization entropy inspired by both invariance entropy and measure-theoretic entropy of dynamical systems, in particular by a characterization of the latter due to Katok [23] and a generalization thereof developed in Ren et al. [43]. Roughly speaking, stabilization entropy looks at the exponential growth rate of the number of length- control sequences necessary to keep the state inside some set for a certain fraction of the number of times with a certain positive probability. The corresponding set, the frequency of times and the probability are parameters that can be adjusted, and the relation to channel capacity can only be established for certain choices of these parameters.
Stochastic stabilization of nonlinear systems driven by noise (especially unbounded noise) over communication channels has been studied in few publications, notably in [51]. With our method we are able to refine the bounds presented in [51]. The approach developed in our paper, unlike the differential-entropic methods in [51] and other publications, allows for
- (i)
refined stochastic stability results applicable to a more general class of system models (Theorems V.1 and VI.2). and more refined stability criteria such as the AMS property in combination with moment conditions (see Corollary V.4),
- (ii)
a more concise and direct derivation, building on volume growth arguments, applicable to a plethora of criteria,
- (iii)
more refined bounds for a large class of systems through trading-off growth rates with the measures of sets under the coordinate projection of a stationary measure (see Theorem V.1),
- (iv)
the unification of the theory developed for deterministic systems controlled over noise-free communication channels with their stochastic counterparts, involving both stochastic nonlinear dynamical systems and noisy communication channels (see Theorem VII.1).
In the paper at hand, explicit lower bounds on the capacity in terms of characteristics of the system are derived for nonlinear volume-expanding systems with additive control and noise, and for a class of inhomogeneous semi-linear systems with nonlinear dependence on the control variable. For the first class of systems, we obtain a particularly interesting result which displays a trade-off between the volume-expansion rate of the system and the mass distribution of the probability measure coming from the AMS property. This trade-off is a specific feature of nonlinear systems, since in the linear case the influence of the measure is canceled out due to the fact that the Jacobian matrix with respect to the state is a constant in this case. From our results we can easily recover the well-known capacity bound for linear systems, (summing over all eigenvalues of the dynamical matrix), and also previous bounds for nonlinear systems proved via information-theoretic methods.
We emphasize that for the case of noisy channels, at least for a simple class of scalar systems, we are able to derive similar lower bounds as for noiseless channels via relating the number of control sequences needed for stabilization to a state estimation problem, and then by a generalization of the strong converse to the channel coding problem in information theory together with optimal transport theory, relating the channel capacity to a state estimation problem. This approach, in particular, allows for replacing arguments which depend on the maximum number of possible distinct message sequences for noiseless channels with an entropy-theoretic argument. It is our hope that this novel method will also be accessible to a general readership and find further applications.
The paper is organized as follows. In Section II we provide a short literature review. The technical details of the stabilization problem are outlined in Section III. The subsequent Section IV introduces the notion of stabilization entropy. Applications to specific system models are given in Sections V and VI, and Section VII contains our result for noisy channels. Finally, the proofs of two technical lemmas are given in the Appendix.
II A brief literature review
This paper continues along the research programs developed in [24], which considers deterministic systems, and [51], which considers stochastic systems. For comprehensive literature reviews on the subject, we refer to [24, 33, 48]. Here we only provide a short review of the most relevant contributions.
For noise-free linear systems controlled over discrete noiseless channels, various authors have obtained a formula for the smallest channel capacity above which stabilization is possible, under various assumptions on the system and the admissible coders and controllers. This result is usually referred to as a data-rate theorem and asserts that the smallest capacity is given by the logarithm of the unstable determinant of the open-loop system, i.e., the log-sum of the unstable eigenvalues. The earliest works in this context are Wong & Brockett [6] and Baillieul [3]. More general versions of the data-rate theorem have been proven in Tatikonda & Mitter [46] and Hespanha et al. [21]. For noisy systems and mean-square stabilization, or more generally, moment-stabilization, analogous data-rate theorems have been proven in Nair & Evans [38] and Sahai & Mitter [44], see also [32, 34]. For extensive reviews, see [2, 17, 33, 40, 48]. A data-rate theorem for AMS stability of linear systems was established in [48, Thm. 8.5.3] (see also [51, Thm. 3.1]) and [50, Thm. 4.1 and 4.2], [22, Thm. 2.2, 3.2 and 3.5] under various variations. A recent study along a similar construction to the one introduced in [52] and [49] under fixed-rate quantization is [29].
The studies of nonlinear systems have typically considered deterministic systems that are noise-free systems controlled over discrete noiseless channels. In this context, Nair et al. [39] introduced the notion of topological feedback entropy (in analogy to topological entropy for dynamical systems [1]) for discrete-time systems to characterize the smallest average rate of information above which the state can be kept inside a compact controlled invariant set. They also characterized the smallest data rate for stabilization to an equilibrium point as the log-sum of the unstable eigenvalues of the linearization. Colonius & Kawan in [9] introduced the notion of invariance entropy for continuous-time systems for the same stabilization objective. When adapted to the same (discrete-time) setting, the two notions are equivalent, see [11]. A comprehensive review of these concepts is provided in [24]. We also note that recently a concept of metric invariance entropy based on conditionally invariant measures was established in [8]. Further studies on control of nonlinear systems over communication channels have focused on constructive schemes (and not on converse theorems), primarily for noise-free systems and channels, see, e.g., [4, 16, 30].
We also emphasize that for nonlinear systems the problems of local stabilization (stabilization to a point), semi-global stabilization (set invariance) and global stabilization (as in the stochastic stabilization criterion considered here) are fundamentally different from each other, while for linear systems they can all be handled with similar methods, leading to the above-mentioned data-rate theorem in each case. This is related to the fact that for linear systems any local (dynamical or control-theoretic) property is a global property as well. For nonlinear systems, linearization techniques work well for local problems, for semi-global problems only under specific assumptions and for global problems almost not at all. In addition, the presence of (possibly unbounded and additive) noise requires an approach fundamentally different from the machinery utilized for local stabilization problems.
III Preliminaries and problem description
Notation
If is a finite set, we write for its cardinality. The complement of a set is denoted by . We write for the indicator function of a set . By we always denote the base--logarithm. We write for the set of nonnegative integers and put . Moreover, we use the notation for a discrete interval, i.e., for any with . By we denote the standard Euclidean norm on and by any associated operator norm. We write for , , and denote by the closure of a set . The Lebesgue measure on is denoted by . We write for the -identity matrix and for the general linear group of . By we denote the space of all linear maps between vector spaces . We use the notation for the support of a Borel probability measure . The expectation of a random variable is denoted by . The entropy of a -valued Bernoulli random variable with is denoted by , i.e., . The relative entropy of two probability mass functions and on a discrete space is defined by . We refer the reader to [12] for further information-theoretic concepts such as mutual information and channel capacity.
If are two measures on the same measurable space, we write to denote that is absolutely continuous with respect to and its density is essentially bounded.
If is the set of all sequences in some set , we write for elements of and for the left shift operator, i.e.,
[TABLE]
Moreover, we write for and for the Borel -field of a topological space .
To avoid technical problems concerning the measurability of certain sets, we make the following general assumption.
Assumption III.1
We assume that all measurable spaces in this paper are standard Borel and all random variables associated with a given control system are modeled on a common (standard Borel) probability space .
The standard Borel space assumption leads to useful universal measurability properties which are utilized in the paper. A measurable image of a Borel set is called an analytic set [15, App. 2]. We note that this is evidently equivalent to the seemingly more restrictive condition of being a continuous image of a Borel set. The following property will be utilized in our analysis: The image of a Borel set under a measurable map, and hence an analytic set, is universally measurable [15].
Throughout the paper, we consider a stochastic control system
[TABLE]
This defines a measurable map , where is endowed with the Borel -field , is a measurable space and a probability space. The noise is modeled by an i.i.d. sequence of random variables on with associated probability measure . The initial state is modeled by another random variable with probability measure on and is assumed to be independent of .
We write , , for the unique trajectory with initial value associated with the noise realization and the control sequence .
We assume that an encoder, knowing the states at time , transmits at time a symbol through a noiseless discrete channel to a decoder/controller. We assume that the decoder receives the signals without delay. The finite coding alphabet is denoted by and the capacity of the channel is
[TABLE]
Thus, at time , the controller has the symbol string available to generate the control input . Any coding and control policy of this form is called a causal coding and control policy. A more general setup including a noisy channel will be introduced and studied in Section VII.
The considered control objective is to make the state process asymptotically mean stationary (AMS). Writing for the process measure on , i.e.,
[TABLE]
the process is AMS if there is a probability measure on with
[TABLE]
This implies that is a stationary measure for , i.e., for all times and Borel sets .
The AMS property implies the existence of a probability measure on so that
[TABLE]
for every . This can be seen by considering sets of the form . Then reduces to and the measure is given by .
We note that it was shown in [51, Thm. 5.1] that an additive noise system can be made AMS over a finite-capacity channel under mild assumptions. Thus, searching for lower bounds on the necessary channel capacity is a meaningful problem.
IV Stabilization entropy
Definition IV.1
For any Borel set , and , a set is called -spanning if there exists a set with so that for every there is with
[TABLE]
We write for the smallest cardinality of a -spanning set (where if no finite -spanning set exists) and define the -stabilization entropy of system (1) by
[TABLE]
Some remarks about this definition are in order:
(i) The control sequences in the above definition are not generated by a coding and control policy. Indeed, is an intrinsic quantity of the open-loop system.
(ii) The existence and finiteness of -spanning sets is not immediately clear from the definition. However, as we will see below, in relevant cases this is guaranteed. In general, we always have .
(iii) There are some obvious monotonicity properties of the function . Namely, if or become smaller, increases. This in particular implies the existence of corresponding limits as and (which may be infinite).
(iv) The notion of -stabilization entropy is defined in close analogy to the notion of measure-theoretic -entropy [42, 43] for dynamical systems. This quantity generalizes the classical Kolmogorov-Sinai measure-theoretic entropy, on the basis of its characterization due to Katok [23] for ergodic measures. While the original definition of measure-theoretic entropy is based on computing the Shannon entropy of “dynamical partitions”, Katok’s characterization is based on counting the minimal number of “dynamical balls” of a certain radius needed to cover a subset of the state space with measure greater than some threshold.
We now present our key lemma which relates the channel capacity necessary for stabilization to the stabilization entropy. In particular, it shows that finite -spanning sets exist for appropriate choices of , provided that the AMS property can be achieved.
Lemma IV.2
Assume that the AMS property is achieved via a causal coding and control policy over a noiseless channel of capacity . Then for every Borel set with and all sufficiently small we have
[TABLE]
If , then for all and sufficiently small we have
[TABLE]
Proof:
We fix a causal coding and control policy which achieves the AMS property over a noiseless channel of capacity . For a given set with , (2) implies
[TABLE]
Since , this can also be written as
[TABLE]
We pick and choose so that
[TABLE]
By Markov’s inequality, this implies that for every the event
[TABLE]
occurs with probability , since
[TABLE]
Observe that for every the number of ’s in satisfying is . Now for every consider the set
[TABLE]
of control sequences generated in the time interval provided that and , . Since the maximal number of different messages that can be transmitted in the time interval is , we have
[TABLE]
We claim that is -spanning. Indeed,
[TABLE]
for every we have , and the number of ’s in with is . Hence,
[TABLE]
Taking logarithms, dividing by and letting yields the assertion. The case is handled by replacing (4) with the inequality
[TABLE]
for an arbitrarily chosen , and applying the same arguments.∎
Lemma IV.2, while sounding technical, has significant consequences, since it allows for the application of volume-growth arguments that have been used in the literature for deterministic settings.
V Volume-expanding systems
In this section, we assume throughout that the measure of the random variable is absolutely continuous w.r.t. the Lebesgue measure on and that the associated density is essentially bounded, i.e., .
Consider a system of the form
[TABLE]
with and an injective -map satisfying (with denoting the Jacobian of at )
[TABLE]
Theorem V.1
Consider system (5) satisfying (6) and . Assume that the AMS property is achieved with an associated AMS measure via a causal coding and control policy over a noiseless channel of capacity . Then for all Borel sets with we have
[TABLE]
Proof:
The proof is subdivided into four steps.
Step 1. Fix a Borel set with and let be a finite -spanning set (if a finite spanning set does not exist for any , the estimate becomes trivial). For the associated with , define
[TABLE]
for all control and noise sequences and , respectively. Note that the (universal) measurability of follows from Assumption III.1. From the definition of -spanning sets it immediately follows that
[TABLE]
and we have (by Tonelli’s theorem)
[TABLE]
We can write as the disjoint union of the sets
[TABLE]
where ranges through all subsets of with cardinality . Then
[TABLE]
Now we prove that
[TABLE]
for a constant , independent of . First, observe that by the independence of the random variables and , is the probability measure of the joint variable . Hence,
[TABLE]
If we write for the density of with respect to and assume that , we thus find that
[TABLE]
implying that (11) holds with the constant .
Step 2. Writing , we define
[TABLE]
Then we have
[TABLE]
which immediately implies that for (using that is injective and )
[TABLE]
Let . Then an inductive argument yields
[TABLE]
Step 3. Combining (12), (8), (9), (10) and (11), we obtain
[TABLE]
In we use that the sets , , are pairwise disjoint. Because of the assumption that and hence (for each ) is injective, this implies that also the sets are pairwise disjoint. Hence, we can conclude that
[TABLE]
Step 4. We complete the proof by applying Lemma IV.2. Let us first assume that . Then Lemma IV.2 together with Step 3 yields
[TABLE]
As , the desired inequality follows. The case is trivial and the case follows by continuity.∎
Remark V.2
The preceding theorem recovers, as a special case, [51, Thm. 3.2], which shows that . However, the result there is more general with regard to the allowed class of channels.
Remark V.3
In the inequality (7) we see a trade-off between the -measure of the set and the infimal volume growth on . If some characteristics of the measure are known, one can try to optimize the lower bound by a careful choice of . Also observe that
[TABLE]
holds for all Borel sets , where the left-hand side is the expected volume expansion w.r.t. the AMS measure . Hence, it is tempting to conjecture that also the integral above is a lower bound on the capacity. Under the stronger criterion of asymptotic ergodicity, such a bound has been derived in [18].
The next corollary shows that imposing further properties on the AMS measure can lead to more concrete bounds.
Corollary V.4
Consider system (5) satisfying (6) and . Assume that the AMS property is achieved via a noiseless channel of capacity and the measure satisfies for some the moment constraint
[TABLE]
Then the channel capacity satisfies
[TABLE]
Proof:
Consider the set for a fixed . By Markov’s inequality, the moment constraint implies . Hence, Theorem V.1 implies the assertion.∎
Example V.5
For a linear system with , satisfying , our result implies the well-known relation (cf. [50, 48])
[TABLE]
with summation over all eigenvalues of with associated multiplicities . By a simple decoupling argument this can be refined to show that
The next example shows that for nonlinear systems the supremum in (13) is not necessarily attained as , i.e., the lower bound (7) indeed expresses a trade-off between the measure of and the minimal volume expansion on .
Example V.6
Consider a map with derivative
[TABLE]
and note that for all . Since is symmetric and monotonically decreasing on , we obtain
[TABLE]
Corollary V.4, applied with thus yields the capacity bound
[TABLE]
A straightforward analysis shows that this supremum is attained as a maximum at , and hence .
VI Inhomogeneous semilinear systems
In this section, we also assume throughout that . We consider systems of the form
[TABLE]
where and are control variables and is the noise variable. We assume that is a compact, connected metric space and is continuous. The product space will be equipped with the product topology (and hence becomes a compact, connected metric space as well). Obviously, the case of linear systems with additive noise is covered here, since may be chosen to be constant.
The homogeneous system associated with (14) is
[TABLE]
For a given initial state and a control sequence we write for the associated solution of (15). Here
[TABLE]
As we will see below, there always exists a finest continuous decomposition of the trivial vector bundle into invariant subbundles:
[TABLE]
Writing , , for the fibers of the subbundles, their invariance can be expressed by the identities
[TABLE]
The subbundles generalize the Lyapunov spaces of a single operator, i.e., the sums of generalized eigenspaces corresponding to eigenvalues of the same modulus.
Before we formulate our main result, we recall some facts about additive cocycles. An additive cocycle over a continuous map is a function , written as , satisfying
[TABLE]
Lemma VI.1
Let be a continuous map on a compact metric space . Assume that is a continuous additive cocycle over . Then the following identities hold:
[TABLE]
Moreover, all infima above are attained, and the analogous identities with infima replaced by suprema hold.
A purely topological proof of this lemma can be found in [26, Cor. 2]. For a proof of a more general result using ergodic theory see, e.g., [37, App. A].
Theorem VI.2
Consider system (14). Assume that and that there exists a continuous and invariant vector bundle decomposition
[TABLE]
for the homogeneous system (15). Then, if the AMS property is achieved for (14) via a causal coding and control policy over a noiseless channel of capacity , we have
[TABLE]
Proof:
First observe that the mapping is a continuous additive cocycle over the shift . Hence, by Lemma VI.1 the limit
[TABLE]
exists and coincides with the right-hand side in (17). If this limit is , the statement becomes trivial, hence we may and will assume that it is positive.
The proof now proceeds along the following four steps.
Step 1. Let us write for the projection onto along . Observe that by the variation-of-constants formula we can write the solutions of (14) in the form
[TABLE]
We let denote the rank of the subbundle (i.e., the common dimension of its fibers) and write for the -dimensional Lebesgue measure on . Observe that the invariance of and implies
[TABLE]
Moreover, since is a continuous subbundle, the map is continuous. By compactness of , the following maximum exists:
[TABLE]
Indeed, this follows from the fact that the Lebesgue measure of the image of a ball under a projection is proportional to the product of its non-vanishing singular values, which depend continuously on the projection.
Step 2. Fix and with . Assume that there exists a minimal finite -spanning set for (which later will be justified by invoking Lemma IV.2). Then there is with so that for each there is with
[TABLE]
Putting
[TABLE]
for every , we obtain
[TABLE]
Using the notation , for any we define
[TABLE]
Then, as in the proof of Theorem V.1, we obtain
[TABLE]
We define the probability measure on . Then, for any ,
[TABLE]
with . Fixing and putting , an easy computation using (19) and (20) leads to
[TABLE]
which implies (using (21))
[TABLE]
Now
[TABLE]
Putting everything together, we end up with
[TABLE]
To complete the proof, we have to find a reasonable lower bound for the first term above.
Step 3. Fix a subset with and define . Then
[TABLE]
The set on the right-hand side is contained in the closed ball
[TABLE]
As a consequence,
[TABLE]
Let be an inner product on in which and are orthogonal and write for the associated Lebesgue measure. Using compactness of , we can do this in such a way that with a constant for every Lebesgue measurable set and every . For any measurable set , a simple computation yields
[TABLE]
where denotes the -dimensional Lebesgue measure on . Using again the compactness of , we can find another constant (see Step 1) with
[TABLE]
Putting everything together, we arrive at
[TABLE]
Step 4. We combine the results of steps 2 and 3 to obtain
[TABLE]
Letting denote the number of subsets of with , using (23), we end up with
[TABLE]
with a positive constant , where the first inequality follows from , as in the proof of Theorem V.1. Applying the logarithm, dividing by and letting yields
[TABLE]
Here we use, in particular, Lemma .1. Observing that , we can estimate
[TABLE]
leading to
[TABLE]
Now we use that . Let denote the limit in (18) and let . Then, for sufficiently large ,
[TABLE]
Since this holds for all with and was arbitrary, we find that
[TABLE]
Observe that this holds for arbitrary , , and . If is chosen so that , Lemma IV.2 yields
[TABLE]
If for all , we can let , which implies and thus
[TABLE]
Otherwise, we have for all sufficiently large and Lemma IV.2 yields
[TABLE]
also leading to (24). Since is a continuous additive cocycle over the shift , Lemma VI.1 guarantees that the limit and the infimum in (24) can be interchanged (replacing with or ), which completes the proof.∎
Remark VI.3
The proof of the above theorem is partly modeled according to [24, Thm. 3.3]. For a more detailed explanation of the arguments used in Step 3, see [24, Lem. 3.3].
Example VI.4
Consider the special case of a linear system, i.e., . Then the vector bundle decomposition (16) can be chosen as
[TABLE]
where and are the unstable and center-stable subspace of , respectively. This immediately implies
[TABLE]
with summation over the eigenvalues of with algebraic multiplicities .
In the following, we will show that there always exists a finest continuous decomposition of into invariant subbundles
[TABLE]
which is related to the dynamical behavior of the system induced by (15) on the projective bundle . This follows from a general result about linear flows on vector bundles known as Selgrade’s theorem, which reads as follows.
Proposition VI.5
Let be a finite-dimensional real vector bundle with compact metric base space . Assume that , , is a continuous discrete-time linear flow on and that the induced flow on is chain transitive. Then there exists a unique finest Morse decomposition of the induced flow on the projective bundle , and , . Every Morse set defines a -invariant subbundle of via
[TABLE]
and the following decomposition into a Whitney sum holds:
[TABLE]
For an introduction to the concepts of chain transitivity and Morse decompositions used in this proposition we refer to [10, 41]. A continuous-time version of the proposition can also be found in [10]. The discrete-time version follows from a more general result, see [41, Thm. 6.2 and Thm. 7.5].
The next proposition shows that Selgrade’s theorem can be applied to the linear flow generated by equation (15) on the trivial vector bundle .
Proposition VI.6
The solutions of the homogeneous equation (15) define a continuous discrete-time linear flow on the trivial vector bundle with compact metric base space . This flow is given by , . Moreover, the shift map is chain transitive.
Proof:
We know that , equipped with the product topology, is a compact and connected metric space. The flow properties ( and ) are easy to see. Continuity and (fiber-wise) linearity of are clear. From the fact that the periodic points of (which are precisely the periodic sequences) are dense in , it follows that every point in is chain recurrent. It is well-known that a homeomorphism is chain transitive on any closed set which is connected and consists of chain recurrent points.∎
Combining Selgrade’s theorem with Theorem VI.2, we obtain the following corollary.
Corollary VI.7
Consider system (14) and the Selgrade decomposition (25) associated with the homogeneous system (15). Assume that the subbundles are ordered such that
[TABLE]
for , where is the maximal number with this property. Then, if and the AMS property is achieved over a noiseless channel of capacity ,
[TABLE]
where the right-hand side is defined as zero if .
Proof:
Define , . Then . Since is, up to some multiplicative constant, the product of the numbers , , it follows that
[TABLE]
where we use Lemma VI.1 twice. This implies the result.∎
Example VI.8
In the special case when (only one Selgrade bundle) and the system is asymptotically volume-expanding, i.e.,
[TABLE]
the lower bound of Corollary VI.7 reduces to . Indeed, it is easy to see that the infimum over is then attained at the constant sequence with value .
For the general case, one can use numerical methods to approximate the Lyapunov exponents, and hence, the associated volume growth rates, for the homogeneous semilinear system (15). For continuous-time bilinear control systems, methods for the computation of Lyapunov exponents based on algorithms for solving discounted optimal control problems have been developed in [20] (see also [10, App. D]). In general, these methods also work for discrete-time systems.
VII The noisy channel case
For discrete noiseless channels, the key idea combining the volume-growth based approaches for deterministic models with the stochastic system setup was the observation that the number of control sequences is bounded from above by the total number of received messages. This approach clearly does not directly apply to a noisy channel setup, for there can be an arbitrarily large number of possibly distinct received channel outputs, but these may not carry reliable information. In the following, we develop a new method to address this for a discrete memoryless channel (DMC). For a review of channel capacity with feedback see [13], [48, Sec. 5.3.4].
Figure 1 shows the control loop, using a DMC with feedback for data transmission from the encoder to the controller. The channel has a finite input alphabet and a finite output alphabet . The channel input at time is generated by a function so that . The channel maps to in a stochastic fashion so that is a conditional probability measure on for all , for every realization . The controller, upon receiving the information from the channel, generates its decision at time , also causally: .
Consider a DMC with channel capacity (we note that for DMCs, it is a well-known result that feedback cannot increase the capacity). Then the following property, known as the strong converse, holds, see [28], [13, Problem 10.17]: For any , under any coding policy:
[TABLE]
where is the average probability of error among equally likely messages after the channel is used times under coding and decoding policies admissible according to the standard information-theoretic formulation of communication with noiseless feedback, cf. [45].
Now we consider a scalar system of the form
[TABLE]
with a -function satisfying
[TABLE]
Our main result reads as follows.
Theorem VII.1
Consider system (27) satisfying (28). Assume that with denoting the density with respect to , that is a compact interval and
[TABLE]
Then, if the AMS property is achieved via a causal coding and control strategy over a DMC of capacity , we have
[TABLE]
Before the proof, it may be instructive to explain the proof approach which builds on the construction of an auxiliary coding problem that relates the number -per time stage- of distinct control actions (in a similar spirit that was the basis of the definition of stabilization entropy) to an information transmission problem and in turn to an analysis on channel capacity with feedback; by considering the fact that the number of informative messages per time stage to be transmitted with regard to the initial state cannot be less than the desired bound. The coding problem is related to a channel coding theorem via optimal transport inequalities.
Proof:
Throughout the proof, we use the following notation: Observing that we have three sources of stochasticity – the initial state , the noise sequence and the channel noise – every time we make a statement about the probability of an event , we will add subscripts to the letter , indicating which probability measures are involved in computing this probability: The subscript “” is used for the initial state, subscript “” for the noise and subscript “” for the channel.
Let . Without loss of generality, we can assume that . We prove the theorem by contradiction, assuming that . First, we fix a sufficiently small so that
[TABLE]
Since the AMS measure is a probability measure, we can choose for every sufficiently small a with
[TABLE]
Later we will consider an auxiliary coding scheme, where the initial state is to be estimated at each time stage through the knowledge of the control sequence , applied by the controller in . Given a noise realization (that we will fix later), as an estimate for at time we use the center of the compact set
[TABLE]
i.e., the midpoint of . To derive an estimate for the diameter of , let be chosen arbitrarily. We claim that there exists a time with such that
[TABLE]
Indeed, if this was not the case, then the number of ’s in the interval with for each can be at most half of the cardinality of this interval, implying that the total number of ’s in such that is bounded by
[TABLE]
for large enough, a contradiction. We thus obtain
[TABLE]
implying
[TABLE]
Now the AMS property together with (30) implies
[TABLE]
Indeed, this follows by an application of Markov’s inequality:
[TABLE]
From (31) and the definition of we conclude that
[TABLE]
and the left-hand side is smaller than for large by (32). Our aim is to show that
[TABLE]
leading to a contradiction with (32).
To this end, we will distinguish between two complementary cases. To classify these cases, we introduce the notion of a control rate as follows.
For each , let be the set of all possible control sequences in the controller can generate under the given coding and control policy, i.e.,
[TABLE]
We define the control rate by
[TABLE]
We now treat the two possible cases and separately.
Case 1: We fix a noise realization and prove (33) for the conditional probability of the corresponding event given . To simplify notation, we write and instead of and , respectively.
Assume that and pick so that . Put and note that for all sufficiently large . From (31) it follows that
[TABLE]
Since , it follows that as well and thus
[TABLE]
The inequality above holds, since implies the existence of some with . Thus, (33) holds, since
[TABLE]
independently of the noise realization .
Case 2: Assume that the control rate satisfies and
[TABLE]
contrary to (33). Fix a noise realization so that
[TABLE]
and drop the realization in the notation, as in Case 1. Furthermore, write for the conditional probability above.
The rest of Case 2 is subdivided into five steps.
Step 1 (Construction of sets of bins): For every , we define and enumerate the elements of so that
[TABLE]
where
[TABLE]
We define the following collection of bins:
[TABLE]
for , which are not necessarily disjoint. Each has the same Lebesgue measure, which we denote by . From (34) it follows that
[TABLE]
for which it must be, by the analysis in Case 1, that
[TABLE]
We want to concentrate on the bins that are completely contained in . Since we assume that is an interval, the bins that are only partially contained in can contribute only very little measure as becomes large (their union can have at most twice the Lebesgue measure of a single bin), hence we can ignore them. Now assume that the number of bins that are completely outside of is , and for simplicity assume that these bins are always the last bins in the enumeration . For large , this implies
[TABLE]
Hence, must grow at an exponential rate of at least , just as . We will thus, in the rest of the proof, assume w.l.o.g. that all bins are completely contained in .
Now, from we extract a subcollection of disjoint bins via the construction in Lemma .2 (see Figure 2 for an example representation). In particular, we assume that the bins are ordered according to the natural (non-decreasing) order of their left endpoints. This implies
[TABLE]
Furthermore, it must be that
[TABLE]
for otherwise, by the analysis in Case 1, in contradiction to (37) and (38). Now, using the definition (51) of the leftover set, we define a collection of sets
[TABLE]
Hence, , where . The sets are thus pairwise disjoint. Also observe that
[TABLE]
since the leftover set has at most the Lebesgue measure of one bin. Finally, for a fixed , group each collection of successive bins as
[TABLE]
(In the definition of the last bin , we add some empty sets to the collection ). From (39) it follows that the number of these bins also satisfies
[TABLE]
Also observe that
[TABLE]
Let
[TABLE]
and observe that
[TABLE]
Step 2 (The auxiliary coding scheme): We now construct an auxiliary coding scheme (in a traditional information-theoretic sense) as follows: We use the received channel output/control sequence to reconstruct the index of the bin containing by looking at the points . With denoting the estimate of at the decoder, in the following we study . By construction of the bins, if
[TABLE]
there is no ambiguity, hence can be reconstructed and (no error).
On the other hand, if , we have the following analysis: For every , there is so that and hence, given the event , , the correct bin could be either or . So, we can randomly and independently assign the channel output/control to either or . The associated error probability is at most when the events and hold, i.e.,
[TABLE]
Altogether, the error probability in our coding scheme can be estimated as follows:
[TABLE]
From (34) it follows that for all large enough :
[TABLE]
Combining (40) and (42), we obtain
[TABLE]
Since , the union in the definition of is a disjoint union and the union of all equals , we have
[TABLE]
Together with (44), we thus obtain
[TABLE]
Step 3 (Introduction of an auxiliary source variable with uniform distribution): From (43) it follows that
[TABLE]
Since clearly , we obtain
[TABLE]
Combining this with (46) leads to
[TABLE]
Let be an auxiliary random variable on with uniform distribution. Then we have
[TABLE]
Considering the complementary events, we obtain
[TABLE]
Combining this with (47) leads to
[TABLE]
Step 4 (Application of optimal transport theory and coupling of the uniform source with the distribution of ): The information-theoretic formulation of information transmission assumes that the messages to be transmitted are uniformly distributed. In the final step of our analysis, we relate the messages represented by the indices of the ’s with their induced distribution under to a uniformly distributed set of messages: Let be the distribution of the indices of the ’s under and the uniform distribution of , with the same cardinality as the set of ’s. There exists a coupling between and so that the expected error is lower bounded by the total variation distance between and ; by finding a coupling (cf. [47, Eq. (6.11)]), we can achieve that
[TABLE]
Let us estimate . For sufficiently large , we have
[TABLE]
Since , this implies
[TABLE]
Step 5 (Application of the strong converse): In view of all of the above steps, the proposed coding scheme can be used to encode an auxiliary equi-distributed random variable with an asymptotic average probability of error upper bounded by
[TABLE]
This error bound can be made strictly smaller than , when is chosen sufficiently large and sufficiently small. Thus, we arrive at a contradiction with the strong converse (26), because the rate of our coding scheme satisfies
[TABLE]
The proof is complete.∎
We note the following variation where the initial measure may have non-compact support with a proof sketch.
Theorem VII.2
Consider system (27) satisfying (28). Assume that with denoting the density with respect to , and that for every , there exists a compact interval such that, and with
[TABLE]
the following assumption holds:
[TABLE]
Then, if the AMS property is achieved via a causal coding and control strategy over a DMC of capacity , we have
[TABLE]
Remark VII.3
A sufficient condition for (49) is that is differentiable, positive everywhere and monotone decreasing in either direction as increases for sufficiently large values of , and . This follows from an application of L’Hospital’s theorem to the expression
[TABLE]
Probability densities which decay faster than an exponential (such as the Gaussian) satisfy this condition. An exponential density (if one-sided, the denominator will just be ) keeps this ratio a constant as increases and densities with a heavier tail than an exponential do not satisfy this condition.
Proof:
The proof follows almost identically as that of Theorem VII.1: Case 1 follows identically. For Case 2, in the following, fix a sufficiently small and a corresponding . If (33) does not hold, then we can instead of (34), consider
[TABLE]
We will construct the auxiliary coding scheme by embedding the bins inside . We will thus focus on the sub-probability measure defined by the restriction of to , defined formally as for every Borel , and thus we replace (37) with
[TABLE]
The analysis will go through all the way until in Step 5, where the following term needs to be made less than 1:
[TABLE]
The only additional term, when compared with Step 5 of the proof of Theorem VII.1, is the expression . Since is uniformly bounded under the given assumptions, condition (49) ensures that this term can be made arbitrarily small as is made small.∎
VIII Discussion and concluding remarks
In this paper, we considered a stochastic stabilization problem for a general controlled stochastic system over a communication channel. For this problem, we developed a new approach derive fundamental lower bounds on information transmission requirements for control over communication channels. These lower bounds are consistent with the bounds obtained earlier via information-theoretic methods and those obtained for more restrictive models (including linear systems). Moreover, the new proofs are more direct and concise and they allow to obtain finer lower bounds for a large class of systems. The lower bounds obtained for the AMS property are expressed in terms of the determinant of the Jacobian of the nonlinear system model and these recover the existing results for the linear system setup as a special case. For noisy channels, our approach has been to develop a method to relate stabilization entropy and channel capacity through a generalization of the strong converse of information theory.
Achievability results have been obtained for linear systems in [50, 48] and for nonlinear systems in [51]. In particular, [50, Thm. 4.2] shows that for a linear system with a diagonalizable matrix , controlled over a DMC, the AMS property can be achieved whenever the channel capacity exceeds the log-sum of the unstable eigenvalues. Hence, in this case the lower bounds following from the results in this paper match with the upper bound. For nonlinear systems of the form
[TABLE]
with invertible and for every and an i.i.d. sequence of zero-mean Gaussian variables, it is shown in [51, Thm. 5.1] that ergodicity (and thus AMS) can be achieved over a over a discrete noiseless channel under the following assumption: There exist a function with and a constant such that for all . In this case, the minimal required channel capacity satisfies .
Finally, we want to mention that local exponential orbit complexity of the open-loop system (as opposed to the global unstable behavior imposed in the system models studied in Section V), in general, does not lead to a positive bound on the channel capacity. For instance, if a system of the form
[TABLE]
admits a compact uniformly hyperbolic set for the associated deterministic system and the noise amplitude is sufficiently small, it is well-known that the uncontrolled noisy system admits a random hyperbolic set supporting a stationary measure under mild assumptions, cf. [31] (see also the relevant classical theory of positive Harris recurrence [35, 49]). Hence, for an appropriate initial measure , the uncontrolled system is already AMS, implying that no information transmission at all is necessary.
Lemma .1
Let with . Then
[TABLE]
As a consequence,
[TABLE]
Proof:
Let be an i.i.d. sequence of -valued Bernoulli random variables with associated probability distribution , . Then
[TABLE]
Sanov’s theorem (see [12, Thm. 11.4.1]) yields
[TABLE]
where is the information projection of onto , i.e., the distribution that minimizes
[TABLE]
under the constraint . To determine the solution to this minimization problem, we define the function
[TABLE]
whose derivative vanishes if and only if . Computing the second derivative , we see that , hence has a minimum at . Due to the constraint , this is only relevant if . In this case, the minimizing distribution is . Otherwise, the minimum is attained at (by monotonicity) and . This implies the first assertion of the lemma. The identity (50) follows by considering .∎
Lemma .2
Let be a finite collection of compact intervals, each of equal length . Then there exists a pairwise disjoint subcollection satisfying
[TABLE]
Proof:
We may assume that the intervals are ordered so that their left endpoints form a non-decreasing sequence. Then the indexes are determined as follows: Put . Then take the next interval in , which does not intersect and call it . Let . The leftover space between and is
[TABLE]
and has Lebesgue measure , for otherwise would not be the first interval not intersecting . Continuing in this way, we find the desired collection of pairwise disjoint intervals and it follows that , while
[TABLE]
implying 2m(\bigcup_{j=1}^{k}I_{i_{j}})=2kl\geq m\Bigl{(}\bigcup_{i=1}^{r}I_{i}\Bigr{)}. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. L. Adler, A. G. Konheim, M. H. Mc Andrew. Topological entropy . Trans. Amer. Math. Soc. 114 (1965), 309–319.
- 2[2] B. R. Andrievskii, A. S. Matveev, A. L. Fradkov. Control and estimation under information constraints: toward a unified theory of control, computation, and communications . (Russian) Avtomat. i Telemekh. 2010, no. 4, 34–99; translation in Autom. Remote Control 71 (2010), no. 4, 572–633.
- 3[3] J. Baillieul. Feedback designs for controlling device arrays with communication channel bandwidth constraints . In 4th ARO Workshop on Smart Structures, State College, PA , August 1999.
- 4[4] J. Baillieul. Data-rate requirements for nonlinear feedback control . In Proc. 6th IFAC Symp. Nonlinear Control Syst., Stuttgart, Germany, 2004, 1277–1282.
- 5[5] A. R. Barron. The strong ergodic theorem for densities: generalized Shannon-Mc Millan-Breiman theorem . Ann. Probab. 13 (1985), no. 4, 1292–1303.
- 6[6] W. S. Wong, R. W. Brockett. Systems with finite communication bandwidth constraints. II. Stabilization with limited information feedback . IEEE Trans. Automat. Control 44 (1999), no. 5, 1049–1053.
- 7[7] F. Colonius. Minimal bit rates and entropy for stabilization . SIAM J. Control Optim. 50 (2012), 2988–3010.
- 8[8] F. Colonius. Metric invariance entropy and conditionally invariant measures . Ergodic Theory Dynam. Systems 38 (2018), no. 3, 921–939.
