Two Examples of COM Bounds using Spectral Gaps: Length of the LIS in a   Random Permutation and Lipschitz Functions of 1d Markov Chains

Michael Froehlich; Shannon Starr

arXiv:1901.08410·math.PR·January 25, 2019

Two Examples of COM Bounds using Spectral Gaps: Length of the LIS in a Random Permutation and Lipschitz Functions of 1d Markov Chains

Michael Froehlich, Shannon Starr

PDF

Open Access

TL;DR

This paper explores concentration of measure bounds using spectral gaps and Lipschitz constants for two examples: the length of the LIS in a random permutation and Lipschitz functions of 1d Markov chains, highlighting the method's versatility.

Contribution

It demonstrates how spectral gap and Lipschitz constant techniques can be applied to derive COM bounds for different probabilistic models, including permutations and Markov chains.

Findings

01

Derived COM bounds similar to Talagrand's for LIS length

02

Applied spectral gap methods to Lipschitz functions of 1d Markov chains

03

Showed effectiveness of auxiliary Markov chains in concentration bounds

Abstract

We consider two examples for a well-known method for obtaining concentration of measure (COM) bounds for a given observable in a given measure. The method is to consider an auxiliary Markov chain for which the invariant distribution is the measure of interest. Then one obtains COM bounds involving two quantities. The first is the spectral gap of the Markov transition matrix. The second is an appropriate Lipschitz constant for the observable of interest with respect to 1 step of the Markov chain. We consider two examples of the basic method. The first is to obtain rough COM bounds for the length of the longest increasing subsequence (LIS) in a uniform random permutation. The bounds are similar to well-known bounds of Talagrand using his isoperimetric inequality. The second example is to consider a 1d Markov chain: $X_{0}, X_{1}, \dots, X_{n}$ . We assume the invariant measure for the chain…

Equations118

F_{μ}^{+} (f; a) := def μ ({x \in X : f (x) - E_{μ} [f] \geq a}) and F_{μ}^{-} (f; a) := def F_{μ}^{+} (- f; a),

F_{μ}^{+} (f; a) := def μ ({x \in X : f (x) - E_{μ} [f] \geq a}) and F_{μ}^{-} (f; a) := def F_{μ}^{+} (- f; a),

μ \cdot P = μ, that is: \forall x \in X, we have μ (x) = y \in X \sum μ (y) P (y, x) .

μ \cdot P = μ, that is: \forall x \in X, we have μ (x) = y \in X \sum μ (y) P (y, x) .

\forall x, y \in X, we have μ (x) P (x, y) = μ (y) P (y, x) .

\forall x, y \in X, we have μ (x) P (x, y) = μ (y) P (y, x) .

\mathcal{E}_{P,\mu}(f,g)\,=\,\frac{1}{2}\,\sum_{x,y\in\mathcal{X}}\big{(}f(x)-f(y)\big{)}\big{(}g(x)-g(y)\big{)}\,\mu(x)P(x,y)\,.

\mathcal{E}_{P,\mu}(f,g)\,=\,\frac{1}{2}\,\sum_{x,y\in\mathcal{X}}\big{(}f(x)-f(y)\big{)}\big{(}g(x)-g(y)\big{)}\,\mu(x)P(x,y)\,.

\mathcal{E}_{P,\mu}(f,g)\,=\,\sum_{x\in\mathcal{X}}f(x)\cdot\big{(}(I-P)g\big{)}(x)\mu(x)\,.

\mathcal{E}_{P,\mu}(f,g)\,=\,\sum_{x\in\mathcal{X}}f(x)\cdot\big{(}(I-P)g\big{)}(x)\mu(x)\,.

Λ_{μ}^{(1)} (P) = min ({E_{P, μ} (f, f) : f \in C (X), Var_{μ} (f) \geq 1}) = \frac{1}{sup ( { Var _{μ} ( f ) : E _{P, μ} ( f , f ) \leq 1 } )},

Λ_{μ}^{(1)} (P) = min ({E_{P, μ} (f, f) : f \in C (X), Var_{μ} (f) \geq 1}) = \frac{1}{sup ( { Var _{μ} ( f ) : E _{P, μ} ( f , f ) \leq 1 } )},

Φ_{P}^{Lip} (f) := def x \in X max y \in X \sum P (x, y) ∣ f (x) - f (y) ∣^{2}^{1/2} .

Φ_{P}^{Lip} (f) := def x \in X max y \in X \sum P (x, y) ∣ f (x) - f (y) ∣^{2}^{1/2} .

Θ (t) = n = 0 \sum \infty 2^{n} ln (\frac{1}{1 - t \cdot 2 ^{- 2 n}}) .

Θ (t) = n = 0 \sum \infty 2^{n} ln (\frac{1}{1 - t \cdot 2 ^{- 2 n}}) .

ln (E_{μ} [e^{λ \cdot f}]) \leq λ E_{μ} [f] + Θ \frac{λ \cdot Φ _{P}^{Lip} ( f )}{2 ( Λ _{P}^{(1)} ( μ ) ) ^{1/2}} .

ln (E_{μ} [e^{λ \cdot f}]) \leq λ E_{μ} [f] + Θ \frac{λ \cdot Φ _{P}^{Lip} ( f )}{2 ( Λ _{P}^{(1)} ( μ ) ) ^{1/2}} .

Ξ_{P} (μ, f) := def \frac{2 ( Λ _{P}^{(1)} ( μ ) ) ^{1/2}}{Φ _{P}^{Lip} ( f )} .

Ξ_{P} (μ, f) := def \frac{2 ( Λ _{P}^{(1)} ( μ ) ) ^{1/2}}{Φ _{P}^{Lip} ( f )} .

- ln (F_{μ}^{+} (f; a)) \geq 0 \leq λ < Ξ_{P} (μ, f) max (λa - Θ (\frac{λ}{Ξ _{P} ( μ , f )})) .

- ln (F_{μ}^{+} (f; a)) \geq 0 \leq λ < Ξ_{P} (μ, f) max (λa - Θ (\frac{λ}{Ξ _{P} ( μ , f )})) .

ϰ = n = 1 \sum \infty 2^{n} ln (\frac{1}{1 - 4 ^{- n}}) = n = 1 \sum \infty 2^{n} m = 1 \sum \infty \frac{4 ^{- mn}}{m} = m = 1 \sum \infty \frac{2}{m ( 4 ^{m} - 2 )},

ϰ = n = 1 \sum \infty 2^{n} ln (\frac{1}{1 - 4 ^{- n}}) = n = 1 \sum \infty 2^{n} m = 1 \sum \infty \frac{4 ^{- mn}}{m} = m = 1 \sum \infty \frac{2}{m ( 4 ^{m} - 2 )},

Θ (t) \leq ln (\frac{1}{1 - t}) + ϰ .

Θ (t) \leq ln (\frac{1}{1 - t}) + ϰ .

ln (F_{μ}^{+} (f; a)) \leq ϰ - ϑ (a \cdot Ξ_{P} (μ, f)),

ln (F_{μ}^{+} (f; a)) \leq ϰ - ϑ (a \cdot Ξ_{P} (μ, f)),

ϑ (x) = x \cdot φ (x) + ln (\frac{2}{x} \cdot φ (x)), for φ (x) = 1 + \frac{1}{x ^{2}} - \frac{1}{x} .

ϑ (x) = x \cdot φ (x) + ln (\frac{2}{x} \cdot φ (x)), for φ (x) = 1 + \frac{1}{x ^{2}} - \frac{1}{x} .

1.084640 \leq ϰ \leq 1.084645 .

1.084640 \leq ϰ \leq 1.084645 .

\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;f)\,\stackrel{{\scriptstyle\mathrm{def}}}{{:=}}\,\max_{x\in\mathcal{X}}\left(\sum_{y\in\mathcal{X}}P(x,y)|f(x)-f(y)|^{2}\cdot\mathbf{1}_{(0,\infty)}\big{(}f(x)-f(y)\big{)}\right)^{1/2}\,.

\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;f)\,\stackrel{{\scriptstyle\mathrm{def}}}{{:=}}\,\max_{x\in\mathcal{X}}\left(\sum_{y\in\mathcal{X}}P(x,y)|f(x)-f(y)|^{2}\cdot\mathbf{1}_{(0,\infty)}\big{(}f(x)-f(y)\big{)}\right)^{1/2}\,.

Ξ_{P}^{+} (μ, f) := def \frac{2 ( Λ _{P}^{(1)} ( μ ) ) ^{1/2}}{Φ _{asym}^{Lip} ( P ; f )} .

Ξ_{P}^{+} (μ, f) := def \frac{2 ( Λ _{P}^{(1)} ( μ ) ) ^{1/2}}{Φ _{asym}^{Lip} ( P ; f )} .

ln (E_{μ} [e^{λ \cdot f}]) \leq λ E_{μ} [f] + Θ (\frac{λ}{Ξ _{P}^{+} ( μ , f )}) .

ln (E_{μ} [e^{λ \cdot f}]) \leq λ E_{μ} [f] + Θ (\frac{λ}{Ξ _{P}^{+} ( μ , f )}) .

- ln (F_{μ}^{+} (f; a)) \geq 0 \leq λ < Ξ_{P}^{+} (μ, f) max (λa - Θ (\frac{λ}{Ξ _{P}^{+} ( μ , f )})) .

- ln (F_{μ}^{+} (f; a)) \geq 0 \leq λ < Ξ_{P}^{+} (μ, f) max (λa - Θ (\frac{λ}{Ξ _{P}^{+} ( μ , f )})) .

\forall t \in N, \forall x_{0}, x_{1}, \dots, x_{t} \in X, we have P ({X_{0} = x_{0}, X_{1} = x_{1}, \dots, X_{t} = x_{t}}) = μ (x_{0}) P (x_{0}, x_{1}) \dots P (x_{t - 1}, x_{t}) .

\forall t \in N, \forall x_{0}, x_{1}, \dots, x_{t} \in X, we have P ({X_{0} = x_{0}, X_{1} = x_{1}, \dots, X_{t} = x_{t}}) = μ (x_{0}) P (x_{0}, x_{1}) \dots P (x_{t - 1}, x_{t}) .

\left(\mathbb{E}\left[\big{(}f(X_{0})-f(X_{t})\big{)}^{2}\cdot\mathbf{1}_{(0,\infty)}\big{(}f(X_{0})-f(X_{t})\big{)}\right]\right)^{1/2}\,\leq\,t\cdot\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;f)\,.

\left(\mathbb{E}\left[\big{(}f(X_{0})-f(X_{t})\big{)}^{2}\cdot\mathbf{1}_{(0,\infty)}\big{(}f(X_{0})-f(X_{t})\big{)}\right]\right)^{1/2}\,\leq\,t\cdot\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;f)\,.

P_{m,n}\big{(}(x_{1},\dots,x_{n}),(y_{1},\dots,y_{n})\big{)}\,=\,\frac{1}{mn}\,\sum_{i=1}^{n}\left(\prod_{j\in\{1,\dots,n\}\setminus\{i\}}\delta(x_{i},y_{i})\right)\,,

P_{m,n}\big{(}(x_{1},\dots,x_{n}),(y_{1},\dots,y_{n})\big{)}\,=\,\frac{1}{mn}\,\sum_{i=1}^{n}\left(\prod_{j\in\{1,\dots,n\}\setminus\{i\}}\delta(x_{i},y_{i})\right)\,,

\mu_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}\,=\,\frac{1}{m^{n}}\,,\ \text{ for every }\,(x_{1},\dots,x_{n})\in A_{m}^{n}\,.

\mu_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}\,=\,\frac{1}{m^{n}}\,,\ \text{ for every }\,(x_{1},\dots,x_{n})\in A_{m}^{n}\,.

F (x_{1}, \dots, x_{n}) = i \in J \prod f_{i} (x_{i}),

F (x_{1}, \dots, x_{n}) = i \in J \prod f_{i} (x_{i}),

Λ_{μ_{m, n}}^{(1)} (P_{m, n}) = \frac{1}{n} .

Λ_{μ_{m, n}}^{(1)} (P_{m, n}) = \frac{1}{n} .

f_{n}\big{(}(x_{1},\dots,x_{n})\big{)}\,=\,\max\left(\left\{|J|\,:\,J\subset\{1,\dots,n\}\,,\ \text{ and }\,\forall i,j\in J\,,\ \text{ we have }\,(i<j)\Rightarrow(x_{i}<x_{j})\right\}\right)\,.

f_{n}\big{(}(x_{1},\dots,x_{n})\big{)}\,=\,\max\left(\left\{|J|\,:\,J\subset\{1,\dots,n\}\,,\ \text{ and }\,\forall i,j\in J\,,\ \text{ we have }\,(i<j)\Rightarrow(x_{i}<x_{j})\right\}\right)\,.

\forall i, j \in J, we have (i < j) \Rightarrow (x_{i} < x_{j}) .

\forall i, j \in J, we have (i < j) \Rightarrow (x_{i} < x_{j}) .

\Delta_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}\stackrel{{\scriptstyle\mathrm{def}}}{{:=}}\,\sum_{(y_{1},\dots,y_{n})\in\mathcal{X}_{m,n}}P_{m,n}\big{(}(x_{1},\dots,x_{n}),(y_{1},\dots,y_{n})\big{)}\cdot\\ \left(f_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}-f_{m,n}\big{(}(y_{1},\dots,y_{n})\big{)}\right)^{2}\cdot\\ \mathbf{1}_{(0,\infty)}\left(f_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}-f_{m,n}\big{(}(y_{1},\dots,y_{n})\big{)}\right)\,,

\Delta_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}\stackrel{{\scriptstyle\mathrm{def}}}{{:=}}\,\sum_{(y_{1},\dots,y_{n})\in\mathcal{X}_{m,n}}P_{m,n}\big{(}(x_{1},\dots,x_{n}),(y_{1},\dots,y_{n})\big{)}\cdot\\ \left(f_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}-f_{m,n}\big{(}(y_{1},\dots,y_{n})\big{)}\right)^{2}\cdot\\ \mathbf{1}_{(0,\infty)}\left(f_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}-f_{m,n}\big{(}(y_{1},\dots,y_{n})\big{)}\right)\,,

\Delta_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}\,\leq\,\frac{|J|}{n}\,.

\Delta_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}\,\leq\,\frac{|J|}{n}\,.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Markov Chains and Monte Carlo Methods · Random Matrices and Applications

Full text

Two Examples of COM Bounds using Spectral Gaps: Length of the LIS in a Random Permutation and Lipschitz Functions of 1d Markov Chains

Michael A. Frölich1 and Shannon Starr 2

1 Department of Anesthesiology and Perioperative Medicine,

School of Medicine, University of Alabama at Birmingham (UAB)

2 Department of Applied Mathematics, UAB, Birmingham, AL 35294–1170

(January 4, 2018)

Abstract

We consider two examples for a well-known method for obtaining concentration of measure (COM) bounds for a given observable in a given measure. The method is to consider an auxiliary Markov chain for which the invariant distribution is the measure of interest. Then one obtains COM bounds involving two quantities. The first is the spectral gap of the Markov transition matrix. The second is an appropriate Lipschitz constant for the observable of interest with respect to 1 step of the Markov chain.

We consider two examples of the basic method. The first is to obtain rough COM bounds for the length of the longest increasing subsequence (LIS) in a uniform random permutation. The bounds are similar to well-known bounds of Talagrand using his isoperimetric inequality.

The second example is to consider a 1d Markov chain: $X_{0},X_{1},\dots,X_{n}$ . We assume the invariant measure for the chain $\mu$ is reversible, and let the initial distribution of $X_{0}$ be $\mu$ . Then the observable of interest is any function $f(X_{0},X_{1},\dots,X_{n})$ , which is Lipschitz with respect to replacement of single variables. One case of this is “target frequency analysis,” which is of interest in biostatistics. The auxiliary Markov chain is Glauber dynamics which is gapped in 1d.

1 Statement of the General Method

This article is about obtaining concentration of measure (COM) bounds for certain observables in given measures.

For the present article, suppose that $\mathcal{X}$ is a fixed, finite set. Suppose we are interested in COM bounds for an observable, by which we mean a real-valued function $f$ on $\mathcal{X}$ . And we are interested in COM bounds for the observable $f$ relative to a non-degenerate measure $\mu$ on $\mathcal{X}$ . Let $\mathcal{M}_{+,1}(\mathcal{X})$ denote the set of all probability measures $\mu$ on $\mathcal{X}$ . (So we have $\forall x\in\mathcal{X}$ that $\mu(x)\geq 0$ and we also have $\sum_{x\in\mathcal{X}}\mu(x)=1$ .) Let $\mathcal{C}(\mathcal{X})$ denote the set of all real-valued functions $f:\mathcal{X}\to\mathbb{R}$ .

Given a pair $(\mu,f)$ from $\mathcal{M}_{+,1}(\mathcal{X})\times\mathcal{C}(\mathcal{X})$ , let us define the positive and negative fluctuations as

[TABLE]

for $a\in[0,\infty)$ . We are interested in concentration of measure bounds which are bounds on $\mathcal{F}^{\pm}_{\mu}(f;\cdot)$ for a particular pair $(\mu,f)$ . In this article, we will sometimes derive different types of bounds for $\mathcal{F}^{+}_{\mu}(f;\cdot)$ than for $\mathcal{F}^{-}_{\mu}(f;\cdot)$ . But if we have sufficiently good bounds for one, then that leads to (possibly weaker) bounds for the other one by Markov’s inequality, using the fact that $\int_{0}^{\infty}\mathcal{F}^{+}_{\mu}(f;t)\,dt=\int_{0}^{\infty}\mathcal{F}^{-}_{\mu}(f;t)\,dt$ .

In this setting, Aida and Stroock described a useful elementary method for obtaining such bounds, using Chebyshev’s inequality [1]. They did this along the way to proving even more sophisticated bounds, but for the present article we focus on their first, elementary method.

Suppose one has a Markov chain given by transition matrix $P:\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ , having the property that $\mu$ is stationary for $P$ . So

[TABLE]

Also, suppose that moreover $\mu$ is reversible for $P$ , meaning

[TABLE]

(The condition (3) implies (2), of course, since $P$ satisfies $P(x,y)\geq 0$ for all $x,y\in\mathcal{X}$ and $\sum_{y\in\mathcal{X}}P(x,y)=1$ for all $x\in\mathcal{X}$ .) Then Aida and Stroock used the Markov chain to derive COM bounds in the measure $\mu$ .

Denote the variance in the measure $\mu$ as $\operatorname{Var}_{\mu}(f)=\sum_{x\in\Omega}\big{(}f(x)-\mathbb{E}_{\mu}[f]\big{)}^{2}\mu(x)$ , as usual. The Dirichlet form for $P$ , relative to $\mu$ , is denoted by $\mathcal{E}_{P,\mu}:\mathcal{C}(\mathcal{X})\times\mathcal{C}(\mathcal{X})\to\mathbb{R}$ , defined as

[TABLE]

It is well-known (see for example, Section 13.2 of [6]) that

[TABLE]

Since $\mu$ is a reversible measure relative to the Markov transition matrix $P$ , the Dirichlet form $\mathcal{E}_{P,\mu}$ is a positive semi-definite bilinear form on the vector space $\mathcal{C}(\mathcal{X})$ . Then the spectral gap of $P$ , relative to the reversible measure $\mu$ , is

[TABLE]

which is well-defined as long as $\mu$ is non-degenerate ( $\forall x\in\mathcal{X}$ , we have $\mu(x)>0$ ) and $|\mathcal{X}|\neq 1$ . The spectral gap $\Lambda^{(1)}_{\mu}(P)$ is strictly positive as long as $P$ is irreducible ( $\forall x,y\in\mathcal{X}$ , there is a $T\in\{1,2,\dots\}$ and $x_{1},\dots,x_{T-1}\in\mathcal{X}$ such that $P(x_{t},x_{t+1})>0$ for all $t\in\{0,\dots,T-1\}$ where $x_{0}=x$ and $x_{T}=y$ ). See, for example, Levin and Peres for the notation (used in this article) related to finite state space Markov chains [6].

Then, also, let us denote the $L^{\infty}$ - $L^{2}$ Lipschitz constant of $f\in\mathcal{C}(\mathcal{X})$ with respect to 1 step of the Markov chain

[TABLE]

The Aida-Stroock theorem then gives exponential moment generating function bounds.

Theorem 1.1

Define a function $\Theta:[0,1)\to[0,\infty)$ as

[TABLE]

Then, for $\lambda\geq 0$ , satisfying $\lambda<2\left(\Lambda^{(1)}_{P}(\mu)\right)^{1/2}/\Phi^{\mathrm{Lip}}_{P}(f)$ , it is true that

[TABLE]

This, in turn, gives bounds on $\mathcal{F}^{+}_{\mu}(f;\cdot)$ using Chebyshev’s inequality. Let us define

[TABLE]

Then we can deduce that the positive fluctuations actually an exponential decay bound almost at the rate of $\Xi_{P}(\mu,f)$ :

Corollary 1.2

For $a\geq 0$ , the positive fluctuations obey the bound (which is written as a negative exponential value for the probability)

[TABLE]

By doing a small amount of calculus, this becomes clearer.

Corollary 1.3

Defining a constant

[TABLE]

we have

[TABLE]

And hence it follows (by using this bound and calculating the Legendre transform of the bounding function) that

[TABLE]

where $\vartheta:[0,\infty)\to[0,\infty)$ is an asymptotically linear function

[TABLE]

The absolute constant $\varkappa$ satisfies

[TABLE]

An excellent reference for Theorem 1.1 and Corollary 1.2 is Ledoux’s monograph on the concentration of measure phenomenon [5]. As stated before, Aida and Stroock proved their result on the way to proving more sophisticated results. They did not include details of the proofs beyond the basic outline. But in Section 3.1 in Ledoux’s monograph he states the equivalent result as Theorem 3.3, and he gives complete details of the proof. In particular, when the details are laid out, it becomes apparent that one can obtain a slight improvement if one restricts attention to positive fluctuations. (So far, we have restricted attention to positive fluctuations.)

Theorem 1.4

In place of (7) consider the asymmetric Lipschitz constant

[TABLE]

Also, define the replacement of (10) as

[TABLE]

Then, we also have, for $\lambda\geq 0$ , satisfying $\lambda<\widetilde{\Xi}_{P}^{+}(\mu,f)$ , it is true that

[TABLE]

And, hence, it is also true that for $a\geq 0$ , the positive fluctuations obey the bound

[TABLE]

We do not bother to re-state the calculus facts, but suffice it to say that in Corollary 1.2 and Corollary 1.3 the constant $\Xi_{P}(\mu,f)$ may be replaced by $\widetilde{\Xi}_{P}^{+}(\mu,f)$ .

The slight generalization of Theorem 1.4 will be used in our first example. We will also include a brief discussion to note that the asymmetric focus on fluctuations is natural and also is already well-established in some famous examples. Let us mention that the asymmetric Lipschitz constant defined in (17) satisfies good properties.

Proposition 1.5

We have the following properties.

For all $f,g\in\mathcal{C}(\mathcal{X})$ , we have $\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;f+g)\leq\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;f)+\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;g)$ . 2. 2.

For all $f\in\mathcal{C}(\mathcal{X})$ and all $c\in[0,\infty)$ we have $\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;c\cdot f)=c\cdot\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;f)$ . 3. 3.

Moreover, if we assume that $P$ is irreducible, then we have: $\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P;f)=0$ implies that $f$ is constant.

The third condition above is related to being Lipschitz. Another condition justifying that name is the following. Suppose that $X_{0},X_{1},X_{2},\dots$ are the states of a random realization of the Markov chain starting from $\mu$ , so that

[TABLE]

Then

[TABLE]

2 First example of the general method: longest increasing sub-sequence of a uniform random permutation

Given $m,n\in\mathbb{N}=\{1,2,\dots\}$ , let $A_{m}=\left\{0,\frac{1}{m},\frac{2}{m},\dots,1\right\}$ and let $\mathcal{X}_{m,n}=A_{m}^{n}$ which is the set of all $(x_{1},\dots,x_{n})$ with $x_{1},\dots,x_{n}\in A_{m}$ . The Markov chain transition matrix $P_{m,n}:\mathcal{X}_{m,n}\to\mathcal{X}_{m,n}$ is just replacement of one of the coordinates uniformly at random, where the replacement is by an element of $A_{m}$ chosen uniformly at random. Therefore, we have

[TABLE]

where $\delta(x,y)=1$ if $x=y$ , and equals [math] otherwise. Since the Kronecker $\delta$ function is symmetric, this is a reversible measure for the invariant measure $\mu_{m,n}$ which is the uniform measure

[TABLE]

Now suppose that we have a set $J\subset\{1,\dots,n\}$ and we choose for each $i\in J$ a function $f_{i}:A_{m}\to\mathbb{R}$ such that $\sum_{x\in A_{m}}f_{i}(x)=0$ , but $f_{i}$ is not identically zero. In other words, we just assume $f_{i}$ is orthogonal to the constant function. Then, letting $F:\mathcal{X}_{m,n}\to\mathbb{R}$ be the function

[TABLE]

it is easy to see that $P_{m,n}F=\frac{n-|J|}{n}F$ . (If one of the coordinate indices in $J$ is selected for replacement, then the function $f_{i}(y_{i})$ after replacement has average equal to zero since $f_{i}$ is orthogonal to the constant function. If any other index is chosen, then the function $F$ is unchanged.) This suffices to determine a spanning set of eigenvectors. So the set of eigenvalues is $\{1,1-\frac{1}{n},1-\frac{2}{n},\dots,0\}$ . In particular, using (5) and (6), we have the following.

Lemma 2.1

For the replacement Markov chain we have been considering, the spectral gap is

[TABLE]

Now, for the observable, we take $f_{m,n}:\mathcal{X}_{m,n}\to\mathbb{R}$ to be the length of the longest increasing subsequence. More precisely, let us define $f_{n}:[0,1]^{n}\to\mathbb{R}$ to be

[TABLE]

Then we take $f_{m,n}$ to just be the restriction $f_{n}\restriction A_{m}^{n}$ .

Now let us try to calculate the Lipschitz constant. Suppose that for some $(x_{1},\dots,x_{n})\in\mathbb{R}^{n}$ , the set $J\subset\{1,\dots,n\}$ is a set where $f_{m,n}\big{(}(x_{1},\dots,x_{n})\big{)}=|J|$ , and such that

[TABLE]

Then, in one step of the Markov chain, updated by $P_{m,n}$ , the only way to decrease the value of $f_{m,n}$ is to choose one of the indices in $J$ to replace by a uniform random sample. That has probability equal to $|J|/n$ . In that case, it is still possible that the length of the longest increasing subsequence does not decrease. But if it does decrease, it only decreases by 1. Therefore, we have, defining

[TABLE]

it is the case that

[TABLE]

Now for the Lipschtiz constant (asymmetric, semi-norm) we have to maximize over all choices of $(x_{1},\dots,x_{n})$ .

That would actually give us a much larger constant than if we restricted to the typical choice of $(x_{1},\dots,x_{n})$ . That is because the typical value of $|J|$ is approximately $2\sqrt{n}$ , as determined by Vershik and Kerov [11] and Logan and Shepp [7]. The correct order is $\sqrt{n}$ from an even easier argument of Hammersley [4]. But if we had large deviation bounds, then we could use those to initialize a more refined bound. In view of all of this, we will just truncate, by-hand. Given any constant $K\in\mathbb{N}$ , let us define

[TABLE]

Then we define $f_{m,n}^{(K)}=f_{n}^{(K)}\restriction A_{m}^{n}$ . Then the above calculations show that if $(x_{1},\dots,x_{n})$ is such that $|J|\leq K$ then we have

[TABLE]

But if $|J|\geq K+1$ then $f_{n}\big{(}(x_{1},\dots,x_{n})\big{)}=|J|>K$ and no matter what, we will also have $f_{n}^{(K)}\big{(}(y_{1},\dots,y_{n})\big{)}=K=f_{n}^{(K)}\big{(}(y_{1},\dots,y_{n})\big{)}$ . Therefore, from (17) we have

[TABLE]

This is the bound which we wanted.

We will take $K$ to be a number depending on $n$ , so that we really have a sequence $K_{n}$ . And we will choose the sequence such that

[TABLE]

As stated before, to obtain un-restricted bounds, we would need to combine this with large deviation bounds. Implicitly, we are assuming that in more general applications, it would be easier to get large deviation bounds than concentration-of-measure bounds. So, from (18) we have

[TABLE]

Then, using this with Corollary 1.3, using $\widetilde{\Xi}_{P}^{+}(\mu,f)$ in place of $\Xi_{P}(\mu,f)$ as discussed at the end of the last section, we obtain the following.

Corollary 2.2

For the truncation of the length of the longest increasing subsequence,truncated at the level $K=K_{n}$ such that $\lim_{n\to\infty}K_{n}/(uN^{1/2})=1$ for some $u\in(2,\infty)$ , we have the bound

[TABLE]

So in particular, for a fixed $t\in(0,\infty)$ the right hand side converges to $\varkappa-\vartheta(2t/u^{1/2})$ .

Note that in the above, the best case for the right hand side would be if one could take $u$ close to $2$ to get $\varkappa-\vartheta(\sqrt{2}\,t)$ . But one cannot do better than that with these methods. (On can only do that well if one uses good large deviation bounds that are sufficiently good even going down to the median.) These bounds essentially show that with this method one can determine that the fluctuations are no larger than order $n^{1/4}$ . As shown by Baik, Deift and Johannson the true fluctuations are of order $n^{1/6}$ . But that is bounded by order $n^{1/4}$ , so that these bounds are not untrue. They just are not very sharp. But that is the situation also for Talagrand’s bounds from [10].

We note that the idea of developing asymmetric bounds for the positive and negative fluctuations is not an original idea. It is already advocated by Talagrand. The reason that he obtains bounds as good as he does for the length of the longest increasing subsequence is that the function $f_{n}\big{(}(x_{1},\dots,x_{n})\big{)}$ only depends on $(x_{1},\dots,x_{n})$ through the points whose indices are in $J\subset\{1,\dots,n\}$ . He called functions such as this “configuration functions.” Another good reference is Steele’s monograph [9].

We also note that bounds for the negative fluctuations are easily obtained from the bounds for the positive fluctuations, using Markov’s inequality. Of course, one will not obtain as sharp a result that way. Using

[TABLE]

we may determine

[TABLE]

If we have bounds showing that $\mathcal{F}^{+}_{\mu_{m,n}}\left(f_{m,n}^{(K_{n})};tn^{1/4})\right)$ decays exponentially, because we chose $K_{n}$ to be approximately $un^{1/2}$ for some $u\in(2,\infty)$ , then we can see that

[TABLE]

for a constant $C$ that depends on $K_{n}$ , or alternatively depends on $u$ . That way, we would see that the negative fluctuations $\mathcal{F}_{\mu_{m,n}}^{-}(f^{(K_{n})}_{m,n};t)$ are also decaying when $t$ is at the order of $n^{1/4}$ . So, even though the positive and negative fluctuations have different types of bounds, the order of the size of the fluctuations that one obtains bounds for using this technique is the same for both positive and negative fluctuations. It is order $n^{1/4}$ in this case.

Remark 2.3

If we take $n$ fixed and let $m$ go to $\infty$ , then we obtain the analogous bounds when the points $x_{1},\dots,x_{n}$ are chosen uniformly on the continuous interval $[0,1]$ , in an IID fashion. Since the function only depends on the permutation or relative order induced by the points, that is not a singular limit. Rather, for finite $m$ , the probability that none of the components are equal is 1 minus a quantity which is on the order of $n^{2}/m$ by the Birthday problem. When none of the components are equal, conditioning on that event, we do have uniform random permutations, just as if $x_{1},\dots,x_{n}$ were distributed uniformly on the continuous interval $[0,1]$ in an IID fashion.

3 Second example of the general method: Lipschitz functions of 1d Markov chains

Suppose that $\mathcal{Y}$ is a finite state space, and consider a larger state space $\mathcal{X}_{n}=\mathcal{Y}^{n+1}$ for some $n\in\mathbb{N}=\{1,\dots,n\}$ . Suppose that $Q:\mathcal{Y}\times\mathcal{Y}\to\mathbb{R}$ is a Markov transition matrix which is irreducible and aperiodic, and suppose that there is a measure $\nu:\mathcal{Y}\to\mathbb{R}$ (satisfying $\nu(x)\geq 0$ for all $x\in\mathcal{Y}$ and $\sum_{x\in\mathcal{Y}}\nu(x)=1$ ). And, suppose that $\nu$ is reversible with respect to $Q$ :

[TABLE]

By irreducibility and aperiodicity, we know that $\Lambda^{(1)}_{\nu}(Q)>0$ . The Markov chain we will consider is on $\mathcal{X}_{n}$ instead of $\mathcal{Y}$ . But this fact is potentially useful for proving lower bounds on the spectral gap of the chain on $\mathcal{X}_{n}$ .

Before stating the Markov transition matrix for the chain on $\mathcal{X}_{n}$ , let us define the measure we wish to be the invariant measure for the Markov chain. Let $\mu_{n}:\mathcal{X}_{n}\to\mathbb{R}$ be the measure defined as

[TABLE]

for each $(x_{0},x_{1},\dots,x_{n})\in\mathcal{X}_{n}=\mathcal{Y}^{n+1}$ . Here $X_{0},X_{1},\dots,X_{n}$ is viewed as a sample of the 1d Markov chain, from times $t=0$ to $t=n$ , started at time [math] in the distribution $\nu$ . By reversiblity of $\nu$ with respect to $Q$ , we also have

[TABLE]

and for $t\in\{1,\dots,n-1\}$

[TABLE]

These alternative formulations are potentially useful for proving reversibility for the Markov chain on $\mathcal{X}_{n}$ , which is what we consider next.

We consider Glauber dynamics as the Markov chain on $\mathcal{X}_{n}=\mathcal{Y}^{n+1}$ . In other words, we consider $P_{n}:\mathcal{X}_{n}\times\mathcal{X}\to\mathbb{R}$ to be the Markov transition matrix where

[TABLE]

where for $t\in\{1,\dots,n-1\}$ we have

[TABLE]

while

[TABLE]

Note that by reversibility 0f $\nu$ with respect to $Q$ , these can be written seemingly different but equivalent ways. For example,

[TABLE]

as well.

It is easy to see that each of the $P_{n,t}$ matrices, for $t\in\{0,1,\dots,n-1\}$ is such that $\mu_{n}$ is reversible for $P_{n,t}$ . For example, if $t$ is in $\{1,\dots,n-1\}$ , then

[TABLE]

if $y_{s}=x_{s}$ for all $s\in\{1,\dots,n\}\setminus\{t\}$ (and it equals [math] otherwise). Isolating $x_{t}$ and $y_{t}$ , and assuming $y_{s}=x_{s}$ for all $s\in\{1,\dots,n\}\setminus\{t\}$ in order not to get 0, this is

[TABLE]

for

[TABLE]

Clearly (48) is symmetric in interchange of the two coordinates of $(x_{t},y_{t})$ . Also, the conditions imposed by multiplying by $\prod_{s\in\{0,\dots,n\}\setminus\{t\}}\delta(x_{s},y_{s})$ is also symmetric in interchange of every $(x_{s},y_{s})$ for $s\in\{0,\dots,n\}\setminus\{t\}$ . So $\mu$ is a reversible measure for Glauber dynamics. We refer to Chapter 3 of Levin and Peres, for example [6], for more details on Glauber dynamics.

Proposition 3.1

For the Glauber dynamics we have been considering, there is a constant $\lambda_{1}$ satisfying

[TABLE]

such that

[TABLE]

We will not prove this, here. But it is reportedly well-known. A reference is Lu and Yau’s paper on Glauber dynamics and Kawasaki dynamics [8].

We have a specific application in mind, which we call target frequency analysis, which is also hypothesis testing for the power spectrum (Fourier transform amplitudes-squared) integrated over certain intervals. But before moving to that example, let us just quickly state the general result.

Corollary 3.2

Suppose that we have a function $f_{n}:\mathcal{X}_{n}\to\mathbb{R}$ satisfying

[TABLE]

for some power $p$ . Then we have the bound

[TABLE]

using the notation from Corollary 1.3.

For us, the power we will obtain will be $p=1$ , so that the fluctuations will be shown to be bounded by $O(n^{-1/2})$ in this way, for a nonnegative observable whose mean is order-1.

3.1 Example: Target frequency analysis for Markov chains

As another basic application, we consider a statistic for time series which was considered by the authors and Jung in [3]. This is called “target frequency analysis.” The application is important in biostatistics. But it also supplies a pedagogically valuable example for the technique.

Let $\mathcal{Y}$ be $\mathcal{Y}_{m,\epsilon}=\{-m\epsilon,(-m+1)\epsilon,\dots,m\epsilon\}$ . For us, an important quantity is the radius of this chain $R=m\epsilon$ . Note that $\mathcal{Y}_{m,\epsilon}\subset\mathbb{R}$ , and the next step in the description of the function $f_{n}$ on $\mathcal{X}_{n}=(\mathcal{Y}_{m,\epsilon})^{n+1}$ only relies on that. Given a real sequence $(x_{0},\dots,x_{n})$ we define the Fourier transform $\phi_{(x_{0},\dots,x_{n})}:\mathbb{Z}\to\mathbb{C}$ defined as if $(x_{0},\dots,x_{n})$ were the $n+1$ components of a periodic signal

[TABLE]

where $i=\sqrt{-1}$ , as usual (despite the fact that in earlier sections the symbol $i$ was used for an integer index). The choice of the prefactor $1/\sqrt{n+1}$ is such that Parseval’s identity is satisfied

[TABLE]

Now, given any choice of $a,b\in\mathbb{Z}$ satisfying $0<a<b<(n+1)/2$ , we consider the observable of interest to be

[TABLE]

In other words, using the language of signal processing, it is the power spectrum integrated from $a$ to $b$ . Now it is rescaled by $1/(n+1)$ because for a signal of length $n+1$ , we expect the total $\ell^{2}$ -norm (also called the total power) to be of order $(n+1)$ . So this rescales to give an order-1 quantity. Note that the Fourier transform is an isometry by (54), therefore $f_{n}$ may be viewed as a contraction mapping times a constant $1/(n+1)$ . For this reason, we obtain

[TABLE]

Since $\widetilde{\Phi}^{\mathrm{Lip}}_{\mathrm{asym}}(P_{n};f_{n})\leq\Phi^{\mathrm{Lip}}_{P_{n}}(f_{n})$ , this proves the following using Corollary 3.2.

Corollary 3.3

For the function $f_{n}$ written above, we have

[TABLE]

We call the function $f_{n}$ by the name “target frequency analysis.” It has a special property: if we replace the present set-up by the case where $\mathcal{Y}=\mathbb{R}$ and allow $X_{0},\dots,X_{n}$ to be IID standard, normal random variables (also called white noise, by some), then the Fourier transform has the property for frequencies $k$ satisfying $0<k<(n+1)/2$ the real and imaginary parts are all IID standard, normal random variables. (Here IID refers to the independence of the real and imaginary parts, as well as independence for different values of $k$ .) By the Parseval identity, isometry property, it is elementary that the Fourier transform of IID complex-valued signals with real and imaginary parts being IID standard, normal random variables would have the same property. But the property stated for the Fourier transform of a real signal is slightly less trivial, although it may be easily checked using covariance matrices. One may also thinking of this fact as arising from the slight extra information included in the dihedral symmetry over the usual cyclic symmetry, for the dihedral group $D_{n}$ being the semi-direct product of the cyclic group $(\mathbb{Z}/n\mathbb{Z})$ with the involution group $(\mathbb{Z}/2\mathbb{Z})$ . One can also see it by using properties of the complex conjugation, which amounts to the same. But it is probably the simplest example of a more general phenomenon where for special symmetrical models, random variables defined on a large space have unexpected projections into some components which also have simple, explicit distributions on smaller spaces.

The corollary above shows that more generally, for Markov chain models of a time series, the target frequency analysis will still be concentrating at least in the sense that the fluctuations are no larger than order $n^{1/2}$ .

Bibliography11

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Aida and Stroock. Moment estimates derived from Poincaré and logarithmic Sobolev inequalities. Math. Res. Lett. 1 , 75–86 (1995).
2[2] Jinho Baik, Percy Deift and Kurt Johansson. Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. J. Amer. Math. Soc. 12 , 1119–1178 (1999).
3[3] A Target Frequency Analysis of functional MRI Data. Michael Fr’́ohlich, Paul Jung and Shannon Starr. Int. J. Clin. Biostat-Biom (2015) 2015, no 1:2 (6 pages).
4[4] John Michael Hammersley. A few seedlings of research. In Proc. Sixth Berkeley Symp. Math. Statist. Probab. v. 1 , pp. 345–394. Univ. California Press, Berkeley, 1972.
5[5] Michel Ledoux. The Concentration of Measure Phenomenon. AMS, Providence, RI, 2001.
6[6] David A. Levin and Yuval Peres. Markov Chains and Mixing Times. Second edition. American Mathematical Society, Providence, RI, 2017.
7[7] Benjamin F. Logan and Lawrence A. Shepp. A variational problem for random Young tableaux. Adv. Math. 26 , 206–222 (1977).
8[8] Sheng Lin Lu and Horng-Tzer Yau. Spectral gap and logarithmic Sobolev inequality for Kawasaki and Glauber dynamics. Comm. Math. Phys. 156 , no. 2, 399–433 (1993).