Entropy and dimension of disintegrations of stationary measures
Pablo Lessa

TL;DR
This paper generalizes results on the dimensionality of stationary measures from SL_2(R) to GL(R^d), linking measure dimensions to Lyapunov exponents and entropy, and exploring their relation to spectral simplicity.
Contribution
It extends the understanding of stationary measure disintegrations to higher dimensions, connecting entropy, Lyapunov exponents, and spectral properties.
Findings
Dimensions of conditional measures relate to Lyapunov exponent gaps.
Entropy measures are connected to Lyapunov spectrum simplicity.
Results generalize previous SL_2(R) findings to GL(R^d).
Abstract
We extend a result of Ledrappier, Hochman, and Solomyak on exact dimensionality of stationary measures for to disintegrations of stationary measures for onto the one dimensional foliations of the space of flags obtained by forgetting a single subspace. The dimensions of these conditional measures are expressed in terms of the gap between consecutive Lyapunov exponents, and a certain entropy associated to the group action on the one dimensional foliation they are defined on. It is shown that the entropies thus defined are also related to simplicity of the Lyapunov spectrum for the given measure on .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Entropy and dimension of disintegrations of stationary measures
Pablo Lessa IMERL, Facultad de Ingeniería, Julio Herrera y Reissig 565, 11300 Montevideo, Uruguay. Email: [email protected]
Abstract
We extend a result of Ledrappier, Hochman, and Solomyak on exact dimensionality of stationary measures for to disintegrations of stationary measures for onto the one dimensional foliations of the space of flags obtained by forgetting a single subspace.
The dimensions of these conditional measures are expressed in terms of the gap between consecutive Lyapunov exponents, and a certain entropy associated to the group action on the one dimensional foliation they are defined on. It is shown that the entropies thus defined are also related to simplicity of the Lyapunov spectrum for the given measure on .
1 Introduction
It was shown by Ledrappier [Led84], Hochman and Solomyak [HS17], that if is a probability on the projective space of which is stationary with respect to a probability on with finite Lyapunov exponents, then is exact dimensional and its dimension is where is the Furstenberg entropy and is the largest Lyapunov exponent (hence is the gap between the two Lyapunov exponents).
Suppose now that is a probability on and is a -stationary probability on the space of flags in (i.e. pairs where , is a one dimensional subspace, and is a two dimensional subspace), which is a three-dimensional manifold.
We consider here the two foliations of the space of flags obtained by partitioning into sets of flags sharing the same one dimensional subspace on the one hand, and flags sharing the same two dimensional subspace on the other. These are foliations by circles, and furthermore the action of any invertible linear self mapping of preserves both foliations.
In this context we show that the conditional measures obtained by disintegrating with respect to these two foliations, are exact dimensional. Furthermore we express the dimension of these disintegrations in terms of the gap between consecutive Lyapunov exponents as well as two entropies . Before establishing the dimension formula we show that the entropies bound the gaps between exponents from below and therefore, in principle, yield a criteria for simplicity of the Lyapunov spectrum.
We prove our results in a slightly more general context, that of actions of on the space of complete flags in . In this context there are associated one dimensional foliations which correspond to “forgetting” the -dimensional subspace of all flags for some .
1.1 Preliminaries
Let denote the singular values of an element with respect to the standard inner product.
We denote by the space of complete flags in , an element is of the form where is an -dimensional subspace of for each and for .
Let denote the space of flags missing their -dimensional subspace. For a given complete flag we denote by its projection to (i.e. the sequence obtained by removing from ).
We use the notation for equality in distribution between random elements and . And to mean that the probability is absolutely continuous with respect to .
If and are random elements taking values in complete separable metric spaces (a version of) the conditional distribution of given is a -measurable random probability on the range of such that
[TABLE]
for all continuous bounded real functions (here the right-hand side is the conditional expectation of with respect to the -algebra generated by ). Such a conditional distribution is well defined up to sets of zero measure but we will abuse notation slightly referring to ‘the conditional distribution’.
It is always the case that there exists a Borel mapping from the range of to the space of probabilities on the range of such that is a version of the conditional distribution of given . Fixing such a mapping one may speak of for non-random in the range of .
The lower local dimension of a probability measure on a metric space at a point is defined by
[TABLE]
while the upper local dimension is defined by
[TABLE]
where is the ball of radius centered at .
If the lower and upper dimensions of are equal to the same constant -almost everywhere then we say that is exact dimensional and define its global dimension as the given constant.
1.2 Statement of main results
Suppose that is a random element of with distribution such that
[TABLE]
and let be a random element of with distribution which is independent from and such that
[TABLE]
The existence of such a pair is equivalent to the fact that is a -stationary probability, as first defined in [Fur63].
The Lyapunov exponents of relative to are defined by the equations
[TABLE]
where is the Jacobian of the restriction of to the subspace (where the volume measure induced by standard inner product is used on and its image). In the degenerate case where one has , and if is one dimensional one has .
The Lyapunov exponents given by the multiplicative ergodic theorem of [Ose68] for a product of i.i.d. random matrices of distribution are obtained by maximizing the sums over all stationary probabilities as shown in [FK83].
Fix , let be the projection of to , and let be the conditional distribution of given .
Theorem 1** (Inequality between entropy and gap between exponents).**
If is the unique stationary probability on which projects to then almost surely,
[TABLE]
and if and only if almost surely.
Theorem 2** (Dimension of conditional measures).**
If is ergodic, is the unique stationary probability on which projects to , and , then almost surely is exact dimensional and
[TABLE]
In the case both theorems above are known. A proof of Theorem 1 in this case was first given in [Led84]. In the same work the formula for dimension in Theorem 2 is shown to hold for a slightly different notion of dimension. The exact dimensionality of stationary measures when was first proved in [HS17] and this implies the formula above for the same notion of dimension we use here.
Theorem 1 implies that the Lyapunov spectrum is simple (i.e. all exponents are different) if there does not exist a family of conditional probabilities satisfying for almost every . This suggests a connection to criteria for simplicity dating back to [GdM89] and [GR89] though we do not explore this issue further here.
1.3 Acknowledgment
I am grateful to François Ledrappier for many helpful discussions. I would also like to thank an anonymous referee for pointing out an error in a previous version of the proof of theorem 1, and for helping improve the general quality of the article.
Part I Entropy, Mutual information, and Lyapunov exponent gaps
2 Entropy and mutual information
We will define below the conditional mutual information between and given . This is a non-negative -measurable random variable which may take the value .
The purpose of this section is to prove that:
Lemma 1** (Entropy and mutual information).**
If almost surely then almost surely and .
Conversely, if almost surely then whether is finite or not.
This result reduces the problem of showing that almost surely and that to that of bounding the conditional mutual information between and given .
A general reference covering mutual information including Dobrushin’s theorem and the Gelfand-Yaglom-Perez theorem is [Pin64].
2.1 Conditional mutual information
2.1.1 Mutual information
Let and be random elements of two Polish spaces and , and denote the distribution of , , and respectively.
The mutual information between and is defined by
[TABLE]
where the supremum is over all finite partitions of into Borel sets.
Directly from the definition one sees that .
By Jensen’s inequality with equality to [math] if and only if and are independent. If takes countably many values and has finite entropy in the sense of [Sha48] one has .
It was shown in [Dob59] that is the supremum over any sequence of partitions which generate the Borel -algebra in (see also [Gra11, Lemma 7.3]). This has the following important corollary:
Proposition 1** (Semi-continuity of mutual information).**
If in the sense of distributions then .
It was shown in [GfY59] and [Per59] that if then and
[TABLE]
Conversely, if then
[TABLE]
whether the right hand side is finite or not.
These results are usually called the Gelfand-Yaglom-Perez Theorem.
In our context, when , this yields the following result:
Proposition 2**.**
If and then almost surely and .
Conversely, if almost surely then whether is finite or not.
Proof.
The marginal distributions of are and respectively. However the conditional distribution of given is .
Therefore letting be the joint distribution of one has
[TABLE]
for all measurable functions .
If almost surely then
[TABLE]
so that at -almost every point .
On the other hand if then setting one has
[TABLE]
for all measurable funtions .
Letting where is the indicator of an arbitrary subset of , and is continuous on the compact space , one obtains that
[TABLE]
for -almost every . Intersecting the -full measure sets where this holds over a countable dense set of functions , one obtains a full measure set for where .
Hence, the distribution of is absolutely continuous with respect to if and only if almost surely and in this case the Radon-Nikodym derivative between the two at is given by . ∎
2.1.2 Conditional mutual information
Let be a -algebra of measurable sets in the probability space on which the random elements and are defined.
The mutual information between and conditioned on is the unique up to modifications on null sets random variable obtained as above but using the conditional distribution of conditioned on . In the case we use the notation .
One still has almost surely. Almost sure equality to zero occurs if and only if and are conditionally independent given .
In general there is no relation between and or even .
To see this suppose for example that are i.i.d. taking the values with probability and , then one has while almost surely.
On the other hand for any Markov chain one has almost surely, and one may construct examples with . For example, setting and where the are i.i.d. with suffices.
The following semi-continuity property holds:
Proposition 3** (Semi-continuity of conditional mutual information).**
If the conditional distribution of given converges almost surely to the conditional distribution of given then almost surely.
Proof.
This is a direct consequence of Proposition 1. ∎
The following monotonicity property follows immediately from the definition of mutual information
[TABLE]
A more precise version of monotonicity is the following:
Proposition 4** (Chain rule for conditional mutual information).**
If are random elements and a -algebra of events of the probability space on which they are defined, then
[TABLE]
Proof.
When is trivial this is [Gra11, Corollary 7.14] (notice that what said reference denotes by is in our notation). The general case follows by applying this to the conditional distributions given . ∎
2.2 Proof of Lemma 1
We will calculate the marginal distributions and the joint distribution of conditioned on and apply the Gelfand-Yaglom-Perez Theorem as in Proposition 2.
To begin we simply let be the conditional distribution of given .
By stationarity of the conditional distribution of given is .
For the joint distribution notice that the distribution of conditioned on is the same as conditioned on and therefore it is .
Hence the joint conditional distribution of given satisfies (and is determined by the equation)
[TABLE]
for all continuous bounded .
By the Gelfand-Yaglom-Perez Theorem if almost surely then almost surely and
[TABLE]
almost surely.
And conversely, if almost surely one has
[TABLE]
The result now follows by taking expectation.
3 Proof of Theorem 1
In this section we will prove Theorem 1.
The strategy is to approximate by pairs with the property that the conditional distributions are absolutely continuous with respect to the natural geometric measure on their domain of definition.
For the approximating pairs there is a direct relation between the distortion of the conditional measures by a linear mapping and its determinants on certain subspaces. This argument establishes equality between the entropy and the Lyapunov exponent gap for the approximating pairs.
The result is then obtained by passing to the limit using the properties of conditional mutual information discussed in the previous section. At this step equality is lost, and one obtains only an inequality between entropy and the Lyapunov exponent gap.
An important technical issue is that one must maintain the same conditioning -algebra for the approximating pairs and the limit pair in order to apply Proposition 3.
The idea of approximating a probability by one whose stationary probability is absolutely continuous with respect to the natural geometric measure is already present in [Fur63, Theorem 8.6].
3.1 Jacobians of linear actions on flags
We will now briefly, for the duration of this subsection, abandon the context where and are random satisfying in order to discuss a result for a deterministic transformation and flag .
Denote the mapping which removes from each flag in its -dimensional subspace by , and notice that the fibers are 1-dimensional. We consider on each the the unique probability measure which is invariant under the action of orthogonal transformations which fix .
Notice that any element leaves the family of measures quasi-invariant. We will need the explicit Jacobian of the action of on this family of measures.
Lemma 2**.**
If , , and , then
[TABLE]
Proof.
We begin by proving the case (this case is included in the statement of [Fur63, Lemma 8.8] though the proof is omitted there).
In this case and the only non-trivial subspace is which has dimension in . Therefore, we are looking to calculate the Jacobian of the action of on the projective space of lines in at the line with respect to the unique rotationally invariant probability .
For this purpose consider a unit length vector and an orthogonal vector of length . Let be the rectangle .
Since we are considering the action of on projective space, it is equivalent to consider the transformation so that has length one.
Notice that is a paralelogram with a side in of length , and area which is the length of the orthogonal projection of onto the subspace orthogonal to . Calculating the determinant of one obtains explicitly
[TABLE]
Taking the limit as we obtain that the derivative of the action of on projective space at the point is from which it follows that
[TABLE]
as claimed.
We will now show that the general case may be reduced to the two dimensional case.
For this purpose suppose now that , , and .
Notice that the quotient space is two dimensional and inherits an inner product from which makes it isometric to the orthogonal complement of within . The same is true for .
Therefore, letting be the linear map induced by one has
[TABLE]
where on the right hand side the space is considered as a one-dimensional subspace of .
The result follows from the observation that and . ∎
3.2 Proof of Theorem 1
We return now to the notation and context of the statement of Theorem 1. In particular and are independent random elements with distribution and respectively and such that . Recall that is the projection of onto and is the conditional distribution of given .
3.2.1 Representation
Since the statement of the theorem only depends on the joint distribution of we are at liberty to change to any other pair with the same distribution.
For this purpose fix a Borel mapping where , is a Borel probability on , and , such that if is a uniformly distributed random variable on then has distribution .
Assume furthermore for any convergent sequence of probabilities one has almost surely. Such a representation exists by the main result of [BD83].
In the same way fix a representation into , and representation into .
Let be the distribution of the incomplete flag , and the conditional distribution of given .
Setting where are i.i.d. uniform in , one has that has distribution which is the joint distribution of .
To simplify notation we assume from now on .
3.2.2 Perturbation
Let be defined so that conditioned on it is a Brownian motion starting at the identity on the group of orthogonal transformations which fix . To clarify dependence on the other random elements we assume is -measurable where is uniform on and independent from .
Now for each let and notice that almost surely and when almost surely.
We denote by the space of real valued continuous functions on with the topology of uniform convergence, and consider for each the operator defined by
[TABLE]
Notice that and if then . Therefore there is an associated action of on the space of probability measures on defined by
[TABLE]
Lemma 3**.**
For each there is a -invariant probability measure on whose projection onto is .
Furthermore picking for each a measure as above one has , and letting be the disintegration of with respect to the projection to the following properties hold:
Almost surely is absolutely continuous with respect to . 2. 2.
There is a compact subinterval such that takes values in almost surely.
Proof.
Let be the canonical projection.
Let where , and is the distribution of the time of of Brownian motion starting at the identity on the group of orthogonal transformations fixing .
Notice that
[TABLE]
Since preserves the set of functions of the form one obtains that for all probabilities .
In particular preserves the space of probabilities which project onto . By the Markov-Kakutani fixed point theorem, this implies that there is at least one fixed point for in this space.
Because has a continuous positive density with respect to the invariant measure on group of orthogonal transformations stabilizing it follows that, for any probability on the measure satisfies properties 1 and 2 in the statement above.
In particular for any -invariant probability with one has , and therefore satisfies properties 1 and 2.
Finally, let be any continuous function and, supose where . Using the notation for the integral of with respect to the measure , we have
[TABLE]
where we have used that so decreases the norm.
Notice that converges to the point mass at the identity when . The convergence is uniform in the sense that given and letting be the ball of radius centered at the identity in the full orthogonal group, for each there exists such that for all and all . It follows that
[TABLE]
for all and the convergence is uniform.
Since goes to zero we conclude that . Since this holds for all one has that . By hypothesis is the unique measure with this property with projection , therefore .
We have shown that is the only limit point of when . The space of probabilities on is compact and metrizable, and therefore this implies as claimed. ∎
3.2.3 Conclusion of the proof
We will fix from now on a sequence given by the following claim (c.f. [CLP19, section 6.1.6]):
Claim 1**.**
There exists a sequence of positive numbers with such that, letting and be given by lemma 3 one has
[TABLE]
almost surely.
Proof.
To begin fix any sequence of positive numbers with .
Let be a dense sequence of continuous functions on .
Notice that is a bounded sequence in .
By Komlos’ theorem (see [Kom67]) there exists a subsequence such that
[TABLE]
exists for -almost every , and any further subsequence has the same property.
For each , using Komlos’ theorem as above, we may define a subsequence of such that
[TABLE]
exists for -almost every , and any further subsequence has the same property.
Letting we have that
[TABLE]
exists for -almost every and all .
For each the the restriction of to is dense. Since the space of probabilities on is compact, this implies that there exist probabilities such that
[TABLE]
for -almost every .
By lemma 3 one has . Therefore, for any continuous by dominated convergence one has
[TABLE]
from where and for -almost every . ∎
For each let be uniform and independent from , and let and .
Claim 2**.**
One has that .
Proof.
For any continuous function one has
[TABLE]
∎
Since converges in distribution to [math] there exists a subsequence such that almost surely. We fix such a subsequence from now on.
Claim 3**.**
The conditional distribution of given converges almost surely to the conditional distribution of given .
Proof.
It suffices to show that for all bounded and uniformly continuous one has
[TABLE]
almost surely.
The distance between and goes to [math] almost surely. Therefore, since is uniformly continuous, one has
[TABLE]
almost surely.
Because is bounded, by dominated convergence, the limit above also holds in the sense, and therefore
[TABLE]
almost surely.
Noticing that , we now calculate
[TABLE]
where we have used the almost sure convergence of to and boundedness of to move the limit inside the expected value in the third to last step. ∎
In view of the above claim by, lemma 1, proposition 1, and Fatou’s lemma we have
[TABLE]
By monotonocity and the chain rule (Proposition 4) we continue
[TABLE]
Because is independent from and we have
[TABLE]
And, finally, by monotonicity of mutual information
[TABLE]
We conclude the proof by establishing the following:
Claim 4**.**
In the above context one has:
[TABLE]
Proof.
We first claim that the the conditional distribution of given is .
Since has distribution and is independent from it suffices to prove that the conditional distribution of given coincides with that of given .
Since and , we only need to verify that the conditional distribution of given (which is coincides with that of given . This follows immediately since is -invariant.
Now notice that the conditional distribution of given is also . This implies that the conditional distribution of given is .
Let denote the conditional distribution of given .
We have shown that the joint distribution of given has projections and , while its disintegration onto the factor has conditional measures .
Applying the Gelfand-Yaglom-Perez theorem this yields
[TABLE]
Let be the -dimensional subspace of and . Using lemma 2 we obtain
[TABLE]
where for the last equality we have used that has the same distribution as .
Since is an orthogonal transformation the determinants of and coincide on all subspaces. Therefore the right hand side above is equal to
[TABLE]
Since converges in distribution to we have that converges in distribution to the -dimensional subspace of . Because the logarithm of the determinant of on any subspace is bounded between constant multiples of and both of which are integrable, we can pass to the limit (e.g. using dominated convergence after replacing by a sequence with the same individual distributions but which converges almost surely, see [Bil99, Theorem 6.7]) obtaining
[TABLE]
Finally since for , one obtains
[TABLE]
which concludes the proof. ∎
Part II Exact dimensionality and dimension of conditional probabilities
In this part of the article we will prove Theorem 2. We now specify notation and context that will be used throughout.
Recall that is a probability on with respect to which the logarithm of all singular values are integrable and is a -stationary probability on .
A dimension is fixed throughout, is the projection of on the space of incomplete flags missing their -dimensional subspace. It is assumed that is the unique stationary probability with projection .
A disintegration of with respect to is fixed (so ).
We consider an i.i.d. sequence with common distribution and a stationary sequence of random random flags with common distribution such that
[TABLE]
for all and . We will use for the -dimensional subspace of the flag and as before for the incomplete flag obtained by removing the subspace .
By hypothesis is ergodic (i.e. extremal among stationary probabilities) this implies that the stationary sequence is ergodic.
As before, Lyapunov exponents are defined by the equations
[TABLE]
By Theorem 1 one has almost surely and
[TABLE]
We assume from now on that .
4 Non-atomicity of conditional measures
Our first step in the proof of Theorem 2 is that is almost surely non-atomic (i.e. all points have measure zero).
Lemma 4**.**
Almost surely is non-atomic for all .
Proof.
By ergodicity and one has
[TABLE]
almost surely.
Suppose for the sake of contradiction that . Conditioning on this event the equation above becomes
[TABLE]
However, by Poincaré recurrence is recurrent almost surely (i.e. almost surely there exists a subsequence such that ). This implies that which contradicts the hypothesis that . Hence, almost surely, as claimed. ∎
5 The multiplicative ergodic theorem
From Theorem 1 and the hypothesis that one obtains that . We will now apply the multiplicative ergodic theorem of [Ose68] to the mappings induced by the sequence between the quotient spaces to obtain the following result:
Lemma 5**.**
Almost surely for each one has
[TABLE]
and there exists a unique -dimensional subspace containing and contained in such that
[TABLE]
Furthermore, and are conditionally independent given , and almost surely.
Finally, the logarithm of the angle between the projections of and to is when .
Proof.
For each consider the quotient space with the induced inner product coming from , let be the one-dimensional subspace in which is the projection of , and let be mapping induced by .
Notice that almost surely each is isometric to with the usual inner product. Furthermore the random sequence
[TABLE]
is stationary and ergodic.
One has
[TABLE]
which implies by Birkhoff’s theorem that almost surely
[TABLE]
and
[TABLE]
for all .
On the other hand
[TABLE]
which implies that almost surely
[TABLE]
By hypothesis which implies by Theorem 1 that . Hence, one obtains from the multiplicative ergodic theorem of [Ose68] that almost surely
[TABLE]
and
[TABLE]
are complementary one-dimensional subspaces, and the angle between them is .
From the equations above it follows that is -measurable, while is -measurable. Since is -measurable one has that and are conditionally independent given . In particular, conditioned on one has that and are independent.
Setting to be the subspace in which projects to in one obtains the desired result. ∎
6 Proof of Theorem 2
6.1 Random circle diffeomorphisms
We fix from now on a Borel measurable projection from to which consists of mapping to isometrically (where denotes the -dimensional subspace of the flag). Furthermore we fix an isometry between the unit circle with the usual arc-length distance scaled by one half , and the space of one-dimensional subspaces of with the distance given by the angle. The composition of these mappings will be used to identify each fiber of the projection from to with the unit circle. Equivalently, given an incomplete flag we have chosen an isometry from the projective space of to the unit circle, and therefore each -dimensional subspace between and corresponds to a point on the unit circle.
With these identifications let , be be the projection of to , be the projection of to , be the projection of (given by Lemma 5) to , the diffeomorphism of obtained by projecting the action of between and , and for convenience let and . Finally, we let be the rotationally invariant probability on the unit circle.
The proof of Theorem 2 will proceed as follows: We will construct a sequence of random intervals containing and such that is roughly of size . We will then show that is roughly . These two facts will yield that the local dimension of at is almost surely so that in particular that is exact dimensional.
A few technical issues arise which we have concealed with the word ‘roughly’ in the previous paragraph. For example, the estimates for the measure of the intervals will hold only for some values of , but these values are sufficiently dense to imply the needed dimension estimates.
We begin with a simple consequence of lemma 5.
Proposition 5**.**
Let and be fixed and let .
Then the length of converges to [math] exponentially quickly when .
Proof.
The interval corresponds to a cone of one dimensional subspaces in whose angle (with respect to the standard inner product inherited from ) with the projection of is larger than times the angle between and the projection of .
By lemma 5, under the action the linear mapping corresponding to the norm of vectors in are multiplied by a factor of while those in are multiplied by a factor of .
We fix on the domain of the inner product for which the norm on coincides with the standard one, but for which these subspaces are orthogonal.
Similarly on the range of we pick the inner product where are orthogonal and the restriction of the norm on both subspaces coincides with the usual one.
With respect to these inner products the angle between any two subspaces in decreases by a factor of under .
However, once again by lemma 5, the angle between is for the standard inner product. This implies that, measured with the standard inner product the angle between any two subspaces of decreases by the same factor up to a multiplicative . ∎
6.2 Stationary intervals
We now construct the sequence of intervals that will be used in our argument. The key points for what follows are that: the construction is stationary, the intervals contain but not , their size is controlled by , and frequently is not close to zero.
Lemma 6** (Stationary intervals).**
Setting
[TABLE]
one has for all .
Proof.
Since almost surely is non-atomic there is a smallest positive radius such that .
By lemma 5, conditioned on one has that has distribution and is independent from and . Therefore and taking expected value .
In the event that one has that and therefore that . This proves the claim. ∎
What remains is to estimate the size and probability of the sequence .
6.3 Length of distinguished intervals
The point of what follows is that the intervals contain and are roughly of size .
We will use the following result which is essentially Maker’s theorem [Mak40, Theorem 1] or [Bre57, Theorem 1].
Theorem** (Maker’s theorem).**
Let be a family of random variables which is stationary in the sense that its distribution equals that of where .
Suppose that the limit exists almost surely and that for all (or equivalently due to stationary, for some) .
Then almost surely.
Proof.
By Birkhoff’s ergodic theorem
[TABLE]
exists almost surely and is finite.
Following [Bre57, Theorem 1] we write
[TABLE]
The first term converges to almost surely. Letting be the second term notice that for any fixed we have
[TABLE]
where the limit defining exists almost surely and satisfies by Birkhoff’s ergodic theorem.
Since decreases monotonely to [math] we obtain
[TABLE]
so that almost surely. ∎
Lemma 7** (Length of distinguished intervals).**
For all almost surely one has
[TABLE]
for all large enough, where and .
Proof.
Recall that denotes the rotationally invariant probability on the unit circle .
By Lemma 2 one has
[TABLE]
for all .
For each let be the connected component of which is counter-clockwise from and define
[TABLE]
and
[TABLE]
By lemma 5 one has that contains a ball of radius centered at . In view of proposition 5 this implies that for all almost surely eventually . This shows that almost surely the length of goes to zero when and therefore one one has almost surely for all .
Notice that for each one has, again by lemma 2, that
[TABLE]
for some -dimensional subspace between and .
In particular this implies that and have finite expectation since they are controlled by the logarithms of singular values of .
This yields that has finite expectation for all .
Applying Maker’s theorem we obtain
[TABLE]
Finally, since when by Lemma 5 one obtains:
[TABLE]
The same argument shows that which establishes the claims.
∎
6.4 Probability of distinguished intervals
We will now essentially repeat the argument of the previous subsection replacing the rotationally invariant probability measure (which is equivalent to length up to a factor) with the random probabilities .
In this case one wishes to replace (in the ergodic averages) the terms of the form with approximating terms calculated using the intervales . Almost sure convergence of the approximating terms boils down to the theorem on differentiation of measures. However, the integrability of the supremum of the approximating terms is more subtle.
The issue is that the singular values of do not directly control the maximum and minimum of on the circle. In fact, this density may be unbounded with positive probability. Instead, control of the approximation comes from the -integrability of the density with respect to which follows from the fact that (that is Theorem 1).
6.4.1 Orlicz regularity and a maximal inequality
For each let and notice that it is -measurable.
Notice that and are independent conditioned on . Since the conditional distribution of given is one obtains that the distribution of conditioned on has density with respect to . Since this conditional distribution is -measurable and one obtains that the conditional distribution of given has density with respect to . Therefore,
[TABLE]
In particular is almost surely integrable with respect to . In other words, almost surely belongs to an Orlicz space which is slightly smaller than and the expected value of the corresponding Orlicz norm is finite. This fact, which follows from the finiteness of given by Theorem 1, will allow us to control the maximal function of .
We define the maximal function of a function with respect to a probability as
[TABLE]
where the supremum is over all intervals containing .
We will need the following maximal inequality the proof of which is adapted from the proof of [Ste70, Theorem 1].
Lemma 8** (Maximal inequality).**
There exists a constant such that for any probability on and any -integrable function one has
[TABLE]
for all .
Proof.
Given ,, and consider a compact set such that
[TABLE]
By definition, each point in belongs to an interval such that
[TABLE]
Since is compact one may cover it with finitely many such intervals.
Applying the Besicovitch covering lemma (e.g. see [dG75, Theorem 1.1]) there exists a constant (which does not depend on nor ) such that a subcover may be found so that no more than intervals intersect simultaneously.
Summing over such a subcover one has
[TABLE]
This inequality has been established for all -integrable and all . Applying it to one obtains (observing that ) that
[TABLE]
which establishes the claim. ∎
We now use Lemma 8 to control the typical maximal function of . The argument is adapted from [Nev75, Proposition IV-2-10], see the appendix of said work for discussion of this type of results in the context of general Orlicz spaces.
Lemma 9** (Average maximal function).**
In the context above one has
[TABLE]
Proof.
As observed at the beginning of section 6.4.1 the conditional distribution of given has density with respect to . Therefore,
[TABLE]
The lower bound which holds -almost everywhere reduces the problem to showing that the expected value on the right is not .
Applying the inequality (valid for ) one obtains
[TABLE]
We now conclude by using Lemma 8 as follows
[TABLE]
∎
6.4.2 Domination of approximating terms
We will now establish the main estimate needed to apply Maker’s theorem as in Lemma 7. For the needed upper bound Lemma 9 suffices. For the lower bound we mimic the argument of [Chu61].
Lemma 10**.**
For each let and
[TABLE]
Then .
Proof.
Notice first that
[TABLE]
In view of Lemma 9 this bounds from above by an integrable random varaible.
For the lower bound consider the event that and notice that this implies
[TABLE]
Given and define the bad set as the set of points in the circle belonging to an interval such that
[TABLE]
Following the proof of Lemma 8 consider a compact set with
[TABLE]
By considering a finite covering of by intervals satisfying equation 1 and summing over a Besicovitch subcover where no more than intervals overlap (here the constant does not depend on nor ) we obtain:
[TABLE]
Using that the conditional distribution of given and is we obtain
[TABLE]
which shows that is integrable as claimed. ∎
6.4.3 Probability estimates
Having solved the main technical issues we now repeat the argument of Lemma 7 replacing the uniform measure with the random measure to obtains the desired estimate on the -measure of a sequence of intervals shrinking to .
Lemma 11** (Probability of distinguished intervals).**
Almost surely one has
[TABLE]
Proof.
For each and let ,
[TABLE]
and
[TABLE]
Notice that for each the sequence is stationary and almost surely .
Furthermore is integrable by Lemma 10.
Applying Maker’s theorem as in lemma 7, almost surely one has
[TABLE]
as claimed. ∎
A technical issue in what follows is that the asymptotic lower bound for just obtained, is bad when is small. However, in view of Lemma 6, ‘half of the time’, and this suffices for our needs.
6.5 Proof of Theorem 2
Let be the (random) sequence of values of for which . By Lemma 6 this occurs with probability at least for each fixed . Hence, by the ergodic theorem, taking a subsequence we may assume that almost surely.
For each let .
Fix and let and .
Choose two integer valued functions such that
[TABLE]
and
[TABLE]
as .
Notice that eventually one has and therefore by Lemma 7 almost surely
[TABLE]
for all small enough.
Combining these facts one obtains the bounds
[TABLE]
By Lemma 11 almost surely
[TABLE]
when .
This implies that almost surely
[TABLE]
By intersecting over the corresponding full measure sets for a countable sequence one obtains that almost surely is exact dimensional with dimension as claimed.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[BD 83] David Blackwell and Lester E. Dubins. An extension of Skorohod’s almost sure representation theorem. Proc. Amer. Math. Soc. , 89(4):691–692, 1983.
- 2[Bil 99] Patrick Billingsley. Convergence of probability measures . Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, second edition, 1999. A Wiley-Interscience Publication.
- 3[Bre 57] Leo Breiman. The individual ergodic theorem of information theory. Ann. Math. Statist. , 28:809–811, 1957.
- 4[Chu 61] K. L. Chung. A note on the ergodic theorem of information theory. Ann. Math. Statist. , 32:612–614, 1961.
- 5[CLP 19] Matias Carrasco, Pablo Lessa, and Elliot Paquette. On the speed of distance stationary sequences. ar Xiv e-prints , page ar Xiv:1912.12523, December 2019.
- 6[d G 75] Miguel de Guzmán. Differentiation of integrals in R n superscript 𝑅 𝑛 R^{n} . Lecture Notes in Mathematics, Vol. 481. Springer-Verlag, Berlin-New York, 1975. With appendices by Antonio Córdoba, and Robert Fefferman, and two by Roberto Moriyón.
- 7[Dob 59] R. L. Dobrušin. A general formulation of the fundamental theorem of Shannon in the theory of information. Uspehi Mat. Nauk , 14(6 (90)):3–104, 1959.
- 8[FK 83] H. Furstenberg and Y. Kifer. Random matrix products and measures on projective spaces. Israel J. Math. , 46(1-2):12–32, 1983.
