Two Examples of COM Bounds using Spectral Gaps: Length of the LIS in a Random Permutation and Lipschitz Functions of 1d Markov Chains
Michael Froehlich, Shannon Starr

TL;DR
This paper explores concentration of measure bounds using spectral gaps and Lipschitz constants for two examples: the length of the LIS in a random permutation and Lipschitz functions of 1d Markov chains, highlighting the method's versatility.
Contribution
It demonstrates how spectral gap and Lipschitz constant techniques can be applied to derive COM bounds for different probabilistic models, including permutations and Markov chains.
Findings
Derived COM bounds similar to Talagrand's for LIS length
Applied spectral gap methods to Lipschitz functions of 1d Markov chains
Showed effectiveness of auxiliary Markov chains in concentration bounds
Abstract
We consider two examples for a well-known method for obtaining concentration of measure (COM) bounds for a given observable in a given measure. The method is to consider an auxiliary Markov chain for which the invariant distribution is the measure of interest. Then one obtains COM bounds involving two quantities. The first is the spectral gap of the Markov transition matrix. The second is an appropriate Lipschitz constant for the observable of interest with respect to 1 step of the Markov chain. We consider two examples of the basic method. The first is to obtain rough COM bounds for the length of the longest increasing subsequence (LIS) in a uniform random permutation. The bounds are similar to well-known bounds of Talagrand using his isoperimetric inequality. The second example is to consider a 1d Markov chain: . We assume the invariant measure for the chain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Markov Chains and Monte Carlo Methods · Random Matrices and Applications
Two Examples of COM Bounds using Spectral Gaps: Length of the LIS in a Random Permutation and Lipschitz Functions of 1d Markov Chains
Michael A. Frölich1 and Shannon Starr 2
1 Department of Anesthesiology and Perioperative Medicine,
School of Medicine, University of Alabama at Birmingham (UAB)
2 Department of Applied Mathematics, UAB, Birmingham, AL 35294–1170
(January 4, 2018)
Abstract
We consider two examples for a well-known method for obtaining concentration of measure (COM) bounds for a given observable in a given measure. The method is to consider an auxiliary Markov chain for which the invariant distribution is the measure of interest. Then one obtains COM bounds involving two quantities. The first is the spectral gap of the Markov transition matrix. The second is an appropriate Lipschitz constant for the observable of interest with respect to 1 step of the Markov chain.
We consider two examples of the basic method. The first is to obtain rough COM bounds for the length of the longest increasing subsequence (LIS) in a uniform random permutation. The bounds are similar to well-known bounds of Talagrand using his isoperimetric inequality.
The second example is to consider a 1d Markov chain: . We assume the invariant measure for the chain is reversible, and let the initial distribution of be . Then the observable of interest is any function , which is Lipschitz with respect to replacement of single variables. One case of this is “target frequency analysis,” which is of interest in biostatistics. The auxiliary Markov chain is Glauber dynamics which is gapped in 1d.
1 Statement of the General Method
This article is about obtaining concentration of measure (COM) bounds for certain observables in given measures.
For the present article, suppose that is a fixed, finite set. Suppose we are interested in COM bounds for an observable, by which we mean a real-valued function on . And we are interested in COM bounds for the observable relative to a non-degenerate measure on . Let denote the set of all probability measures on . (So we have that and we also have .) Let denote the set of all real-valued functions .
Given a pair from , let us define the positive and negative fluctuations as
[TABLE]
for . We are interested in concentration of measure bounds which are bounds on for a particular pair . In this article, we will sometimes derive different types of bounds for than for . But if we have sufficiently good bounds for one, then that leads to (possibly weaker) bounds for the other one by Markov’s inequality, using the fact that .
In this setting, Aida and Stroock described a useful elementary method for obtaining such bounds, using Chebyshev’s inequality [1]. They did this along the way to proving even more sophisticated bounds, but for the present article we focus on their first, elementary method.
Suppose one has a Markov chain given by transition matrix , having the property that is stationary for . So
[TABLE]
Also, suppose that moreover is reversible for , meaning
[TABLE]
(The condition (3) implies (2), of course, since satisfies for all and for all .) Then Aida and Stroock used the Markov chain to derive COM bounds in the measure .
Denote the variance in the measure as \operatorname{Var}_{\mu}(f)=\sum_{x\in\Omega}\big{(}f(x)-\mathbb{E}_{\mu}[f]\big{)}^{2}\mu(x), as usual. The Dirichlet form for , relative to , is denoted by , defined as
[TABLE]
It is well-known (see for example, Section 13.2 of [6]) that
[TABLE]
Since is a reversible measure relative to the Markov transition matrix , the Dirichlet form is a positive semi-definite bilinear form on the vector space . Then the spectral gap of , relative to the reversible measure , is
[TABLE]
which is well-defined as long as is non-degenerate (, we have ) and . The spectral gap is strictly positive as long as is irreducible (, there is a and such that for all where and ). See, for example, Levin and Peres for the notation (used in this article) related to finite state space Markov chains [6].
Then, also, let us denote the - Lipschitz constant of with respect to 1 step of the Markov chain
[TABLE]
The Aida-Stroock theorem then gives exponential moment generating function bounds.
Theorem 1.1
Define a function as
[TABLE]
Then, for , satisfying , it is true that
[TABLE]
This, in turn, gives bounds on using Chebyshev’s inequality. Let us define
[TABLE]
Then we can deduce that the positive fluctuations actually an exponential decay bound almost at the rate of :
Corollary 1.2
For , the positive fluctuations obey the bound (which is written as a negative exponential value for the probability)
[TABLE]
By doing a small amount of calculus, this becomes clearer.
Corollary 1.3
Defining a constant
[TABLE]
we have
[TABLE]
And hence it follows (by using this bound and calculating the Legendre transform of the bounding function) that
[TABLE]
where is an asymptotically linear function
[TABLE]
The absolute constant satisfies
[TABLE]
An excellent reference for Theorem 1.1 and Corollary 1.2 is Ledoux’s monograph on the concentration of measure phenomenon [5]. As stated before, Aida and Stroock proved their result on the way to proving more sophisticated results. They did not include details of the proofs beyond the basic outline. But in Section 3.1 in Ledoux’s monograph he states the equivalent result as Theorem 3.3, and he gives complete details of the proof. In particular, when the details are laid out, it becomes apparent that one can obtain a slight improvement if one restricts attention to positive fluctuations. (So far, we have restricted attention to positive fluctuations.)
Theorem 1.4
In place of (7) consider the asymmetric Lipschitz constant
[TABLE]
Also, define the replacement of (10) as
[TABLE]
Then, we also have, for , satisfying , it is true that
[TABLE]
And, hence, it is also true that for , the positive fluctuations obey the bound
[TABLE]
We do not bother to re-state the calculus facts, but suffice it to say that in Corollary 1.2 and Corollary 1.3 the constant may be replaced by .
The slight generalization of Theorem 1.4 will be used in our first example. We will also include a brief discussion to note that the asymmetric focus on fluctuations is natural and also is already well-established in some famous examples. Let us mention that the asymmetric Lipschitz constant defined in (17) satisfies good properties.
Proposition 1.5
We have the following properties.
For all , we have . 2. 2.
For all and all we have . 3. 3.
Moreover, if we assume that is irreducible, then we have: implies that is constant.
The third condition above is related to being Lipschitz. Another condition justifying that name is the following. Suppose that are the states of a random realization of the Markov chain starting from , so that
[TABLE]
Then
[TABLE]
2 First example of the general method: longest increasing sub-sequence of a uniform random permutation
Given , let and let which is the set of all with . The Markov chain transition matrix is just replacement of one of the coordinates uniformly at random, where the replacement is by an element of chosen uniformly at random. Therefore, we have
[TABLE]
where if , and equals [math] otherwise. Since the Kronecker function is symmetric, this is a reversible measure for the invariant measure which is the uniform measure
[TABLE]
Now suppose that we have a set and we choose for each a function such that , but is not identically zero. In other words, we just assume is orthogonal to the constant function. Then, letting be the function
[TABLE]
it is easy to see that . (If one of the coordinate indices in is selected for replacement, then the function after replacement has average equal to zero since is orthogonal to the constant function. If any other index is chosen, then the function is unchanged.) This suffices to determine a spanning set of eigenvectors. So the set of eigenvalues is . In particular, using (5) and (6), we have the following.
Lemma 2.1
For the replacement Markov chain we have been considering, the spectral gap is
[TABLE]
Now, for the observable, we take to be the length of the longest increasing subsequence. More precisely, let us define to be
[TABLE]
Then we take to just be the restriction .
Now let us try to calculate the Lipschitz constant. Suppose that for some , the set is a set where f_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}=|J|, and such that
[TABLE]
Then, in one step of the Markov chain, updated by , the only way to decrease the value of is to choose one of the indices in to replace by a uniform random sample. That has probability equal to . In that case, it is still possible that the length of the longest increasing subsequence does not decrease. But if it does decrease, it only decreases by 1. Therefore, we have, defining
[TABLE]
it is the case that
[TABLE]
Now for the Lipschtiz constant (asymmetric, semi-norm) we have to maximize over all choices of .
That would actually give us a much larger constant than if we restricted to the typical choice of . That is because the typical value of is approximately , as determined by Vershik and Kerov [11] and Logan and Shepp [7]. The correct order is from an even easier argument of Hammersley [4]. But if we had large deviation bounds, then we could use those to initialize a more refined bound. In view of all of this, we will just truncate, by-hand. Given any constant , let us define
[TABLE]
Then we define . Then the above calculations show that if is such that then we have
[TABLE]
But if then f_{n}\big{(}(x_{1},\dots,x_{n})\big{)}=|J|>K and no matter what, we will also have f_{n}^{(K)}\big{(}(y_{1},\dots,y_{n})\big{)}=K=f_{n}^{(K)}\big{(}(y_{1},\dots,y_{n})\big{)}. Therefore, from (17) we have
[TABLE]
This is the bound which we wanted.
We will take to be a number depending on , so that we really have a sequence . And we will choose the sequence such that
[TABLE]
As stated before, to obtain un-restricted bounds, we would need to combine this with large deviation bounds. Implicitly, we are assuming that in more general applications, it would be easier to get large deviation bounds than concentration-of-measure bounds. So, from (18) we have
[TABLE]
Then, using this with Corollary 1.3, using in place of as discussed at the end of the last section, we obtain the following.
Corollary 2.2
For the truncation of the length of the longest increasing subsequence,truncated at the level such that for some , we have the bound
[TABLE]
So in particular, for a fixed the right hand side converges to .
Note that in the above, the best case for the right hand side would be if one could take close to to get . But one cannot do better than that with these methods. (On can only do that well if one uses good large deviation bounds that are sufficiently good even going down to the median.) These bounds essentially show that with this method one can determine that the fluctuations are no larger than order . As shown by Baik, Deift and Johannson the true fluctuations are of order . But that is bounded by order , so that these bounds are not untrue. They just are not very sharp. But that is the situation also for Talagrand’s bounds from [10].
We note that the idea of developing asymmetric bounds for the positive and negative fluctuations is not an original idea. It is already advocated by Talagrand. The reason that he obtains bounds as good as he does for the length of the longest increasing subsequence is that the function f_{n}\big{(}(x_{1},\dots,x_{n})\big{)} only depends on through the points whose indices are in . He called functions such as this “configuration functions.” Another good reference is Steele’s monograph [9].
We also note that bounds for the negative fluctuations are easily obtained from the bounds for the positive fluctuations, using Markov’s inequality. Of course, one will not obtain as sharp a result that way. Using
[TABLE]
we may determine
[TABLE]
If we have bounds showing that decays exponentially, because we chose to be approximately for some , then we can see that
[TABLE]
for a constant that depends on , or alternatively depends on . That way, we would see that the negative fluctuations are also decaying when is at the order of . So, even though the positive and negative fluctuations have different types of bounds, the order of the size of the fluctuations that one obtains bounds for using this technique is the same for both positive and negative fluctuations. It is order in this case.
Remark 2.3
If we take fixed and let go to , then we obtain the analogous bounds when the points are chosen uniformly on the continuous interval , in an IID fashion. Since the function only depends on the permutation or relative order induced by the points, that is not a singular limit. Rather, for finite , the probability that none of the components are equal is 1 minus a quantity which is on the order of by the Birthday problem. When none of the components are equal, conditioning on that event, we do have uniform random permutations, just as if were distributed uniformly on the continuous interval in an IID fashion.
3 Second example of the general method: Lipschitz functions of 1d Markov chains
Suppose that is a finite state space, and consider a larger state space for some . Suppose that is a Markov transition matrix which is irreducible and aperiodic, and suppose that there is a measure (satisfying for all and ). And, suppose that is reversible with respect to :
[TABLE]
By irreducibility and aperiodicity, we know that . The Markov chain we will consider is on instead of . But this fact is potentially useful for proving lower bounds on the spectral gap of the chain on .
Before stating the Markov transition matrix for the chain on , let us define the measure we wish to be the invariant measure for the Markov chain. Let be the measure defined as
[TABLE]
for each . Here is viewed as a sample of the 1d Markov chain, from times to , started at time [math] in the distribution . By reversiblity of with respect to , we also have
[TABLE]
and for
[TABLE]
These alternative formulations are potentially useful for proving reversibility for the Markov chain on , which is what we consider next.
We consider Glauber dynamics as the Markov chain on . In other words, we consider to be the Markov transition matrix where
[TABLE]
where for we have
[TABLE]
while
[TABLE]
Note that by reversibility 0f with respect to , these can be written seemingly different but equivalent ways. For example,
[TABLE]
as well.
It is easy to see that each of the matrices, for is such that is reversible for . For example, if is in , then
[TABLE]
if for all (and it equals [math] otherwise). Isolating and , and assuming for all in order not to get 0, this is
[TABLE]
for
[TABLE]
Clearly (48) is symmetric in interchange of the two coordinates of . Also, the conditions imposed by multiplying by is also symmetric in interchange of every for . So is a reversible measure for Glauber dynamics. We refer to Chapter 3 of Levin and Peres, for example [6], for more details on Glauber dynamics.
Proposition 3.1
For the Glauber dynamics we have been considering, there is a constant satisfying
[TABLE]
such that
[TABLE]
We will not prove this, here. But it is reportedly well-known. A reference is Lu and Yau’s paper on Glauber dynamics and Kawasaki dynamics [8].
We have a specific application in mind, which we call target frequency analysis, which is also hypothesis testing for the power spectrum (Fourier transform amplitudes-squared) integrated over certain intervals. But before moving to that example, let us just quickly state the general result.
Corollary 3.2
Suppose that we have a function satisfying
[TABLE]
for some power . Then we have the bound
[TABLE]
using the notation from Corollary 1.3.
For us, the power we will obtain will be , so that the fluctuations will be shown to be bounded by in this way, for a nonnegative observable whose mean is order-1.
3.1 Example: Target frequency analysis for Markov chains
As another basic application, we consider a statistic for time series which was considered by the authors and Jung in [3]. This is called “target frequency analysis.” The application is important in biostatistics. But it also supplies a pedagogically valuable example for the technique.
Let be . For us, an important quantity is the radius of this chain . Note that , and the next step in the description of the function on only relies on that. Given a real sequence we define the Fourier transform defined as if were the components of a periodic signal
[TABLE]
where , as usual (despite the fact that in earlier sections the symbol was used for an integer index). The choice of the prefactor is such that Parseval’s identity is satisfied
[TABLE]
Now, given any choice of satisfying , we consider the observable of interest to be
[TABLE]
In other words, using the language of signal processing, it is the power spectrum integrated from to . Now it is rescaled by because for a signal of length , we expect the total -norm (also called the total power) to be of order . So this rescales to give an order-1 quantity. Note that the Fourier transform is an isometry by (54), therefore may be viewed as a contraction mapping times a constant . For this reason, we obtain
[TABLE]
Since , this proves the following using Corollary 3.2.
Corollary 3.3
For the function written above, we have
[TABLE]
We call the function by the name “target frequency analysis.” It has a special property: if we replace the present set-up by the case where and allow to be IID standard, normal random variables (also called white noise, by some), then the Fourier transform has the property for frequencies satisfying the real and imaginary parts are all IID standard, normal random variables. (Here IID refers to the independence of the real and imaginary parts, as well as independence for different values of .) By the Parseval identity, isometry property, it is elementary that the Fourier transform of IID complex-valued signals with real and imaginary parts being IID standard, normal random variables would have the same property. But the property stated for the Fourier transform of a real signal is slightly less trivial, although it may be easily checked using covariance matrices. One may also thinking of this fact as arising from the slight extra information included in the dihedral symmetry over the usual cyclic symmetry, for the dihedral group being the semi-direct product of the cyclic group with the involution group . One can also see it by using properties of the complex conjugation, which amounts to the same. But it is probably the simplest example of a more general phenomenon where for special symmetrical models, random variables defined on a large space have unexpected projections into some components which also have simple, explicit distributions on smaller spaces.
The corollary above shows that more generally, for Markov chain models of a time series, the target frequency analysis will still be concentrating at least in the sense that the fluctuations are no larger than order .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Aida and Stroock. Moment estimates derived from Poincaré and logarithmic Sobolev inequalities. Math. Res. Lett. 1 , 75–86 (1995).
- 2[2] Jinho Baik, Percy Deift and Kurt Johansson. Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. J. Amer. Math. Soc. 12 , 1119–1178 (1999).
- 3[3] A Target Frequency Analysis of functional MRI Data. Michael Fr’́ohlich, Paul Jung and Shannon Starr. Int. J. Clin. Biostat-Biom (2015) 2015, no 1:2 (6 pages).
- 4[4] John Michael Hammersley. A few seedlings of research. In Proc. Sixth Berkeley Symp. Math. Statist. Probab. v. 1 , pp. 345–394. Univ. California Press, Berkeley, 1972.
- 5[5] Michel Ledoux. The Concentration of Measure Phenomenon. AMS, Providence, RI, 2001.
- 6[6] David A. Levin and Yuval Peres. Markov Chains and Mixing Times. Second edition. American Mathematical Society, Providence, RI, 2017.
- 7[7] Benjamin F. Logan and Lawrence A. Shepp. A variational problem for random Young tableaux. Adv. Math. 26 , 206–222 (1977).
- 8[8] Sheng Lin Lu and Horng-Tzer Yau. Spectral gap and logarithmic Sobolev inequality for Kawasaki and Glauber dynamics. Comm. Math. Phys. 156 , no. 2, 399–433 (1993).
