Concentration of Markov chains with bounded moments

Assaf Naor; Shravas Rao; Oded Regev

arXiv:1906.07260·math.PR·June 19, 2019

Concentration of Markov chains with bounded moments

Assaf Naor, Shravas Rao, Oded Regev

PDF

TL;DR

This paper extends concentration inequalities for finite state Markov chains to cases where the function has bounded moments rather than being bounded, providing dimension-independent bounds and answering a question by Kargin.

Contribution

It introduces new concentration inequalities assuming only bounded moments of the function, generalizing Gillman's bounds and addressing an open question by Kargin.

Findings

01

Derived moment-based concentration inequalities for Markov chains

02

Generalized bounds to $L_p$-valued functions, including Hilbert spaces

03

Provided dimension-independent concentration bounds

Abstract

Let ${W_{t}}_{t = 1}^{\infty}$ be a finite state stationary Markov chain, and suppose that $f$ is a real-valued function on the state space. If $f$ is bounded, then Gillman's expander Chernoff bound (1993) provides concentration estimates for the random variable $f (W_{1}) + \dots + f (W_{n})$ that depend on the spectral gap of the Markov chain and the assumed bound on $f$ . Here we obtain analogous inequalities assuming only that the $q$ 'th moment of $f$ is bounded for some $q \geq 2$ . Our proof relies on reasoning that differs substantially from the proofs of Gillman's theorem that are available in the literature, and it generalizes to yield dimension-independent bounds for mappings $f$ that take values in an $L_{p} (μ)$ for some $p \geq 2$ , thus answering (even in the Hilbertian special case $p = 2$ ) a question of Kargin (2007).

Equations124

\lambda_{\pi}(A)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\|A-E_{\pi}\|_{L_{2}(\pi)\to L_{2}(\pi)}=\sup\Bigg{\{}\bigg{(}\sum_{i=1}^{N}\pi_{i}\Big{(}\sum_{j=1}^{N}a_{ij}u_{j}-\sum_{k=1}^{N}\pi_{k}u_{k}\Big{)}^{2}\bigg{)}^{\frac{1}{2}}:\ u\in\mathbb{R}^{N}\ \mathrm{and}\ \sum_{k=1}^{N}\pi_{k}u_{k}^{2}=1\Bigg{\}}.

\lambda_{\pi}(A)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\|A-E_{\pi}\|_{L_{2}(\pi)\to L_{2}(\pi)}=\sup\Bigg{\{}\bigg{(}\sum_{i=1}^{N}\pi_{i}\Big{(}\sum_{j=1}^{N}a_{ij}u_{j}-\sum_{k=1}^{N}\pi_{k}u_{k}\Big{)}^{2}\bigg{)}^{\frac{1}{2}}:\ u\in\mathbb{R}^{N}\ \mathrm{and}\ \sum_{k=1}^{N}\pi_{k}u_{k}^{2}=1\Bigg{\}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}|f(W_{1})|^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}|f(W_{1})|^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\max\left\{|f(1)|,\ldots,|f(N)|\right\}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\max\left\{|f(1)|,\ldots,|f(N)|\right\}.

\forall\,a>0,\qquad\Pr\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}\geqslant a\max_{j\in[N]}|f(j)|\bigg{]}\lesssim e^{-c(1-\lambda_{\mathbf{W}})na^{2}},

\forall\,a>0,\qquad\Pr\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}\geqslant a\max_{j\in[N]}|f(j)|\bigg{]}\lesssim e^{-c(1-\lambda_{\mathbf{W}})na^{2}},

\Pr\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}\geqslant a\Big{(}\mathbb{E}\big{[}|f(W_{1})|^{q}\big{]}\Big{)}^{\frac{1}{q}}\bigg{]}\lesssim e^{-c(1-\lambda_{\mathbf{W}})na^{2}}.

\Pr\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}\geqslant a\Big{(}\mathbb{E}\big{[}|f(W_{1})|^{q}\big{]}\Big{)}^{\frac{1}{q}}\bigg{]}\lesssim e^{-c(1-\lambda_{\mathbf{W}})na^{2}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{\|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{\|}_{L_{q}(\mu)}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}\|f(W_{1})\|_{L_{q}(\mu)}^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{\|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{\|}_{L_{q}(\mu)}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}\|f(W_{1})\|_{L_{q}(\mu)}^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{\|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{\|}_{H}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}\|f(W_{1})\|_{H}^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{\|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{\|}_{H}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}\|f(W_{1})\|_{H}^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\Pr\bigg{[}\Big{\|}\frac{1}{n}\sum_{i=1}^{n}f(W_{i})-\mathbb{E}[f(W_{1})]\Big{\|}_{H}\geqslant a\max_{j\in[N]}\|f(j)\|_{H}\bigg{]}\lesssim e^{-c(1-\lambda_{\mathbf{W}})na^{2}}.

\Pr\bigg{[}\Big{\|}\frac{1}{n}\sum_{i=1}^{n}f(W_{i})-\mathbb{E}[f(W_{1})]\Big{\|}_{H}\geqslant a\max_{j\in[N]}\|f(j)\|_{H}\bigg{]}\lesssim e^{-c(1-\lambda_{\mathbf{W}})na^{2}}.

\forall(i_{1},\ldots,i_{n})\in[N]^{n},\qquad\tau^{n}_{\mathbf{W}}(i_{1},\ldots,i_{n})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\Pr\big{[}(W_{1},\ldots,W_{n})=(i_{1},\ldots,i_{n})\big{]}=\pi_{\mathbf{W}}(i_{1})a_{i_{1}i_{2}}a_{i_{2}i_{3}}\cdots a_{i_{n-1}i_{n}}.

\forall(i_{1},\ldots,i_{n})\in[N]^{n},\qquad\tau^{n}_{\mathbf{W}}(i_{1},\ldots,i_{n})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\Pr\big{[}(W_{1},\ldots,W_{n})=(i_{1},\ldots,i_{n})\big{]}=\pi_{\mathbf{W}}(i_{1})a_{i_{1}i_{2}}a_{i_{2}i_{3}}\cdots a_{i_{n-1}i_{n}}.

\forall (i_{1}, \dots, i_{n}) \in [N]^{n}, T_{X} f (i_{1}, \dots, i_{n}) = def \frac{1}{n} k = 1 \sum n f (i_{k}) - j = 1 \sum N π_{W} (j) f (j) \in X .

\forall (i_{1}, \dots, i_{n}) \in [N]^{n}, T_{X} f (i_{1}, \dots, i_{n}) = def \frac{1}{n} k = 1 \sum n f (i_{k}) - j = 1 \sum N π_{W} (j) f (j) \in X .

\|\psi\|_{L_{q}(\sigma;X)}=\bigg{(}\sum_{s\in S}\sigma(s)\|\psi(s)\|_{X}^{q}\bigg{)}^{\frac{1}{q}}.

\|\psi\|_{L_{q}(\sigma;X)}=\bigg{(}\sum_{s\in S}\sigma(s)\|\psi(s)\|_{X}^{q}\bigg{)}^{\frac{1}{q}}.

∥ T_{L_{q} (μ)} ∥_{L_{q} (π_{W}; L_{q} (μ)) \to L_{q} (τ_{W}^{n}; L_{q} (μ))} ≲ \frac{q}{( 1 - λ _{W} ) n} .

∥ T_{L_{q} (μ)} ∥_{L_{q} (π_{W}; L_{q} (μ)) \to L_{q} (τ_{W}^{n}; L_{q} (μ))} ≲ \frac{q}{( 1 - λ _{W} ) n} .

∥ T_{L_{2} (μ)} ∥_{L_{q} (π_{W}; L_{2} (μ)) \to L_{q} (τ_{W}^{n}; L_{2} (μ))} ≲ \frac{q}{( 1 - λ _{W} ) n} .

∥ T_{L_{2} (μ)} ∥_{L_{q} (π_{W}; L_{2} (μ)) \to L_{q} (τ_{W}^{n}; L_{2} (μ))} ≲ \frac{q}{( 1 - λ _{W} ) n} .

∥ T_{L_{p} (μ)} ∥_{L_{q} (π_{W}; L_{p} (μ)) \to L_{q} (τ_{W}^{n}; L_{p} (μ))} ≲ \frac{q}{( 1 - λ _{W} ) n} .

∥ T_{L_{p} (μ)} ∥_{L_{q} (π_{W}; L_{p} (μ)) \to L_{q} (τ_{W}^{n}; L_{p} (μ))} ≲ \frac{q}{( 1 - λ _{W} ) n} .

\Bigg{(}\mathbb{E}\bigg{[}\Big{\|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{\|}_{L_{p}(\mu)}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}\|f(W_{1})\|_{L_{p}(\mu)}^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{\|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{\|}_{L_{p}(\mu)}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}\|f(W_{1})\|_{L_{p}(\mu)}^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\forall\,a>0,\qquad\Pr\bigg{[}\Big{\|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{\|}_{L_{p}(\mu)}\geqslant a\max_{j\in[N]}\|f(j)\|_{L_{p}(\mu)}\bigg{]}\lesssim e^{p-c(1-\lambda_{\mathbf{W}})na^{2}}.

\forall\,a>0,\qquad\Pr\bigg{[}\Big{\|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{\|}_{L_{p}(\mu)}\geqslant a\max_{j\in[N]}\|f(j)\|_{L_{p}(\mu)}\bigg{]}\lesssim e^{p-c(1-\lambda_{\mathbf{W}})na^{2}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\bigg{(}\frac{1}{(1-\lambda_{\mathbf{W}})n}\bigg{)}^{1-\frac{1}{q}}\cdot\Big{(}\mathbb{E}\big{[}|f(W_{1})|^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\Bigg{(}\mathbb{E}\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}\lesssim\bigg{(}\frac{1}{(1-\lambda_{\mathbf{W}})n}\bigg{)}^{1-\frac{1}{q}}\cdot\Big{(}\mathbb{E}\big{[}|f(W_{1})|^{q}\big{]}\Big{)}^{\frac{1}{q}}.

(1 - (1 - λ) (1 - ε) (1 - λ) ε (1 - λ) (1 - ε) 1 - (1 - λ) ε) = λ I_{2} + (1 - λ) E_{π (ε)} \in M_{2} (R),

(1 - (1 - λ) (1 - ε) (1 - λ) ε (1 - λ) (1 - ε) 1 - (1 - λ) ε) = λ I_{2} + (1 - λ) E_{π (ε)} \in M_{2} (R),

\displaystyle\Bigg{(}\mathbb{E}\bigg{[}\Big{\|}\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\Big{[}F(W_{1},\ldots,W_{n})\Big{|}W_{i}\Big{]}

\displaystyle\Bigg{(}\mathbb{E}\bigg{[}\Big{\|}\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\Big{[}F(W_{1},\ldots,W_{n})\Big{|}W_{i}\Big{]}

\displaystyle\lesssim\frac{1}{\sqrt{(q-1)(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}\|F(W_{1},\ldots,W_{n})\|_{L_{p}(\mu)}^{q}\big{]}\Big{)}^{\frac{1}{q}}.

\displaystyle\begin{split}\Bigg{(}\mathbb{E}\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}&\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}|f(W_{1})-\mathbb{E}[f(W_{1})]|^{q}\big{]}\Big{)}^{\frac{1}{q}}\\ &\leqslant 2\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}|f(W_{1})|^{q}\big{]}\Big{)}^{\frac{1}{q}},\end{split}

\displaystyle\begin{split}\Bigg{(}\mathbb{E}\bigg{[}\Big{|}\frac{f(W_{1})+\cdots+f(W_{n})}{n}-\mathbb{E}[f(W_{1})]\Big{|}^{q}\bigg{]}\Bigg{)}^{\frac{1}{q}}&\lesssim\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}|f(W_{1})-\mathbb{E}[f(W_{1})]|^{q}\big{]}\Big{)}^{\frac{1}{q}}\\ &\leqslant 2\sqrt{\frac{q}{(1-\lambda_{\mathbf{W}})n}}\cdot\Big{(}\mathbb{E}\big{[}|f(W_{1})|^{q}\big{]}\Big{)}^{\frac{1}{q}},\end{split}

U = def u_{1} 0 ⋮ 0 0 u_{2} ⋱ \dots \dots ⋱ ⋱ 0 0 ⋮ 0 u_{N} = def f (1) 0 ⋮ 0 0 f (2) ⋱ \dots \dots ⋱ ⋱ 0 0 ⋮ 0 f (N) .

U = def u_{1} 0 ⋮ 0 0 u_{2} ⋱ \dots \dots ⋱ ⋱ 0 0 ⋮ 0 u_{N} = def f (1) 0 ⋮ 0 0 f (2) ⋱ \dots \dots ⋱ ⋱ 0 0 ⋮ 0 f (N) .

\mathbb{E}\Big{[}\big{(}f(W_{1})+\cdots+f(W_{n})\big{)}^{2m}\Big{]}\leqslant(2m)!\sum_{\begin{subarray}{c}v_{0},\ldots,v_{2m-1}\in\mathbb{N}\cup\{0\}\\ v_{0}+\cdots+v_{2m-1}\leqslant n-1\end{subarray}}\big{\|}UA^{v_{1}}UA^{v_{2}}\cdots UA^{v_{2m-1}}u\big{\|}_{L_{1}(\pi)}.

\mathbb{E}\Big{[}\big{(}f(W_{1})+\cdots+f(W_{n})\big{)}^{2m}\Big{]}\leqslant(2m)!\sum_{\begin{subarray}{c}v_{0},\ldots,v_{2m-1}\in\mathbb{N}\cup\{0\}\\ v_{0}+\cdots+v_{2m-1}\leqslant n-1\end{subarray}}\big{\|}UA^{v_{1}}UA^{v_{2}}\cdots UA^{v_{2m-1}}u\big{\|}_{L_{1}(\pi)}.

E [i = 1 \prod 2 m f (W_{w_{i}})] = j \in [N]^{2 m} \sum π_{j_{1}} A_{j_{1} j_{2}}^{w_{2} - w_{1}} A_{j_{2} j_{3}}^{w_{3} - w_{2}} \dots A_{i_{2 m - 1} i_{2 m}}^{w_{2 m} - w_{2 m - 1}} k = 1 \prod 2 m u_{j_{k}} = j \in [N]^{2 m} \sum π_{j_{1}} (U A^{w_{2} - w_{1}})_{j_{1} j_{2}} (U A^{w_{3} - w_{2}})_{j_{2} j_{3}} \dots (U A^{w_{2 m} - w_{2 m - 1}})_{j_{2 m - 1} j_{2 m}} u_{j_{2 m}} = i \in [N] \sum π_{i} (U A^{w_{2} - w_{1}} U A^{w_{3} - w_{2}} \dots U A^{w_{2 m} - w_{2 m - 1}} u)_{i} .

E [i = 1 \prod 2 m f (W_{w_{i}})] = j \in [N]^{2 m} \sum π_{j_{1}} A_{j_{1} j_{2}}^{w_{2} - w_{1}} A_{j_{2} j_{3}}^{w_{3} - w_{2}} \dots A_{i_{2 m - 1} i_{2 m}}^{w_{2 m} - w_{2 m - 1}} k = 1 \prod 2 m u_{j_{k}} = j \in [N]^{2 m} \sum π_{j_{1}} (U A^{w_{2} - w_{1}})_{j_{1} j_{2}} (U A^{w_{3} - w_{2}})_{j_{2} j_{3}} \dots (U A^{w_{2 m} - w_{2 m - 1}})_{j_{2 m - 1} j_{2 m}} u_{j_{2 m}} = i \in [N] \sum π_{i} (U A^{w_{2} - w_{1}} U A^{w_{3} - w_{2}} \dots U A^{w_{2 m} - w_{2 m - 1}} u)_{i} .

\displaystyle\mathbb{E}\Big{[}\big{(}f(W_{1})+\cdots+f(W_{n})\big{)}^{2m}\Big{]}

\displaystyle\mathbb{E}\Big{[}\big{(}f(W_{1})+\cdots+f(W_{n})\big{)}^{2m}\Big{]}

⩽ (2 m)! w \in V_{2 m} \sum ∥ U A^{w_{2} - w_{1}} U A^{w_{3} - w_{2}} \dots U A^{w_{2 m} - w_{2 m - 1}} u ∥_{L_{1} (π)} . \qed

∥ U T_{1} U T_{2} \dots U T_{k} u ∥_{L_{1} (π)} ⩽ ∥ u ∥_{L_{q} (π)}^{k + 1} j = 1 \prod k ∥ T_{j} ∥_{L_{\frac{2 q}{q + k + 1 - 2 j}} (π) \to L_{\frac{2 q}{q + k + 1 - 2 j}} (π)} .

∥ U T_{1} U T_{2} \dots U T_{k} u ∥_{L_{1} (π)} ⩽ ∥ u ∥_{L_{q} (π)}^{k + 1} j = 1 \prod k ∥ T_{j} ∥_{L_{\frac{2 q}{q + k + 1 - 2 j}} (π) \to L_{\frac{2 q}{q + k + 1 - 2 j}} (π)} .

\left\|UT_{1}UT_{2}\cdots UT_{k}u\right\|_{L_{\beta(0)}(\pi)}\leqslant\bigg{(}\prod_{i=1}^{k+1}\|u\|_{L_{\alpha(i)}(\pi)}\bigg{)}\prod_{j=1}^{k}\|T_{j}\|_{L_{\beta(j)}(\pi)\to L_{\beta(j)}(\pi)},

\left\|UT_{1}UT_{2}\cdots UT_{k}u\right\|_{L_{\beta(0)}(\pi)}\leqslant\bigg{(}\prod_{i=1}^{k+1}\|u\|_{L_{\alpha(i)}(\pi)}\bigg{)}\prod_{j=1}^{k}\|T_{j}\|_{L_{\beta(j)}(\pi)\to L_{\beta(j)}(\pi)},

∥ U T_{1} U T_{2} \dots U T_{k} u ∥_{L_{β (0)} (π)} ⩽ ∥ u ∥_{L_{α (1)} (π)} ∥ T_{1} U T_{2} \dots U T_{k} u ∥_{L_{β (1)} (π)} .

∥ U T_{1} U T_{2} \dots U T_{k} u ∥_{L_{β (0)} (π)} ⩽ ∥ u ∥_{L_{α (1)} (π)} ∥ T_{1} U T_{2} \dots U T_{k} u ∥_{L_{β (1)} (π)} .

∥ T_{1} U T_{2} \dots U T_{k} u ∥_{L_{β (1)} (π)} ⩽ ∥ T_{1} ∥_{L_{β (1)} (π) \to L_{β (1)} (π)} ∥ U T_{2} \dots U T_{k} u ∥_{L_{β (1)} (π)} .

∥ T_{1} U T_{2} \dots U T_{k} u ∥_{L_{β (1)} (π)} ⩽ ∥ T_{1} ∥_{L_{β (1)} (π) \to L_{β (1)} (π)} ∥ U T_{2} \dots U T_{k} u ∥_{L_{β (1)} (π)} .

\forall j \in [k], β (j) = \frac{1}{\frac{1}{α ( j + 1 )} + \dots + \frac{1}{α ( k + 1 )}} = \frac{1}{\frac{k - j}{q} + \frac{q - k + 1}{2 q}} = \frac{2 q}{q + k + 1 - 2 j},

\forall j \in [k], β (j) = \frac{1}{\frac{1}{α ( j + 1 )} + \dots + \frac{1}{α ( k + 1 )}} = \frac{1}{\frac{k - j}{q} + \frac{q - k + 1}{2 q}} = \frac{2 q}{q + k + 1 - 2 j},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Concentration of Markov chains with bounded moments

Assaf Naor Mathematics Department, Princeton University. Supported by the Packard Foundation and the Simons Foundation. The research that is presented here was conducted under the auspices of the Simons Algorithms and Geometry (A&G) Think Tank.

Shravas Rao Courant Institute of Mathematical Sciences, New York University. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1342536.

Oded Regev Courant Institute of Mathematical Sciences, New York University. Supported by the Simons Collaboration on Algorithms and Geometry and by the National Science Foundation under Grant No. CCF-1814524. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.

Abstract

Let $\{W_{t}\}_{t=1}^{\infty}$ be a finite state stationary Markov chain, and suppose that $f$ is a real-valued function on the state space. If $f$ is bounded, then Gillman’s expander Chernoff bound (1993) provides concentration estimates for the random variable $f(W_{1})+\cdots+f(W_{n})$ that depend on the spectral gap of the Markov chain and the assumed bound on $f$ . Here we obtain analogous inequalities assuming only that the $q$ ’th moment of $f$ is bounded for some $q\geqslant 2$ . Our proof relies on reasoning that differs substantially from the proofs of Gillman’s theorem that are available in the literature, and it generalizes to yield dimension-independent bounds for mappings $f$ that take values in an $L_{p}(\mu)$ for some $p\geqslant 2$ , thus answering (even in the Hilbertian special case $p=2$ ) a question of Kargin (2007).

1 Introduction

For $N\in\mathbb{N}$ , write $[N]\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\{1,\ldots,N\}$ and let $\triangle^{\!N-1}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\big{\{}\pi=(\pi_{1},\ldots,\pi_{N})\in[0,1]^{N}:\ \sum_{i=1}^{N}\pi_{i}=1\big{\}}$ be the simplex of probability measures on $[N]$ . Given $\pi\in\triangle^{\!N-1}$ , denote by $E_{\pi}\in\mathsf{M}_{N}(\mathbb{R})$ the $N$ -by- $N$ matrix all of whose rows equal $\pi$ , i.e., $E_{\pi}u=(\sum_{j=1}^{N}\pi_{j}u_{j},\ldots,\sum_{j=1}^{N}\pi_{j}u_{j})\in\mathbb{R}^{N}$ for every $u=(u_{1},\ldots,u_{N})\in\mathbb{R}^{N}$ .

Given $\pi\in\triangle^{\!N-1}$ , a stochastic matrix $A=(a_{ij})\in\mathsf{M}_{N}(\mathbb{R})$ is $\pi$ -stationary if $\pi A=\pi$ , i.e., $\pi_{i}=\sum_{j=1}^{N}\pi_{j}a_{ji}$ for all $i\in[N]$ . We then define $\lambda_{\pi}(A)$ to be the norm of $A-E_{\pi}$ as an operator from $L_{2}(\pi)$ to $L_{2}(\pi)$ , i.e.,

[TABLE]

Note that if $A$ is diagonalizable over the Hilbert space $L_{2}(\pi)$ , then we have $\lambda_{\pi}(A)=\max\{\lambda_{2}(A),|\lambda_{N}(A)|\}$ , where $1=\lambda_{1}(A)\geqslant\cdots\geqslant\lambda_{N}(A)\geqslant-1$ are the eigenvalues of $A$ . This would occur if $A$ were $\pi$ -reversible, i.e., $\pi_{i}a_{ij}=\pi_{j}a_{ji}$ for all $i,j\in[N]$ , in which case $A$ would be a self-adjoint operator on $L_{2}(\pi)$ ; the reversible setting is the main case of interest in the ensuing discussion, but reversibility is not needed for our proofs.

Let $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ be a Markov chain with state space $[N]$ and transition matrix $A\in\mathsf{M}_{N}(\mathbb{R})$ . One says that $\mathbf{W}$ is stationary if $A$ is $\pi_{\mathbf{W}}$ -stationary for $\pi_{\mathbf{W}}=(\Pr[W_{1}=1],\ldots,\Pr[W_{1}=N])\in\triangle^{\!N-1}$ . Write $\lambda_{\mathbf{W}}=\lambda_{\pi_{\mathbf{W}}}(A)$ .

Theorem 1.1.

Suppose that $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ is a stationary Markov chain whose state space is $[N]$ and with $\lambda_{\mathbf{W}}<1$ . Then, every $f:[N]\to\mathbb{R}$ satisfies the following inequality for every $n\in\mathbb{N}$ and every $q\geqslant 2$ .

[TABLE]

**

The (standard) asymptotic notation $\lesssim$ that appears in (1) (as well as throughout the ensuing discussion) means the following. Given two quantities $\alpha,\beta\in[0,\infty)$ , the notation $\alpha\lesssim\beta$ stands for the assertion that there exists a universal constant $C\in(0,\infty)$ for which $\alpha\leqslant C\beta$ ; this is also denoted by $\beta\gtrsim\alpha$ .

The conclusion (1) of Theorem 1.1 with the random variables $f(W_{1}),\ldots,f(W_{n})$ replaced by i.i.d. random variables coincides with the classical Marcinkiewicz–Zygmund inequality [MZ37]. Our contribution here is therefore to generalize this statement to random variables that are (images of) stationary Markov chains with a spectral gap; the i.i.d. setting is the special case $A=E_{\pi}$ of Theorem 1.1. The bound (1) is optimal; see Remark 4 below. A variant of Theorem 1.1 when $1\leqslant q\leqslant 2$ appears in Remark 3 below.

The precursor (and inspiration) of Theorem 1.1 is the following theorem of Gillman [Gil93, Gil98].

Theorem 1.2.

Suppose that $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ is a stationary Markov chain whose state space is $[N]$ and with $\lambda_{\mathbf{W}}<1$ . Then, every $f:[N]\to\mathbb{R}$ satisfies the following inequality for every $n\in\mathbb{N}$ and every $q\geqslant 2$ .

[TABLE]

Note that Theorem 1.2 is typically stated in the literature as the following concentration inequality, which is commonly called the expander Chernoff bound.

[TABLE]

where $c>0$ is a universal constant. The equivalence of (2) and (3) is standard; $\eqref{eq:infty in rhs}\implies\eqref{eq:tail version}$ is checked by applying Markov’s inequality and optimizing over $q$ , and $\eqref{eq:tail version}\implies\eqref{eq:infty in rhs}$ follows by straightforward integration (both implications appear in Proposition 2.5.2 of the textbook [Ver18]). The same use of Markov’s inequality shows mutatis mutandis that Theorem 1.1 implies the following concentration phenomenon.

Corollary 1.3.

There is a universal constant $c>0$ with the following property. Suppose that $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ is a stationary Markov chain whose state space is $[N]$ and with $\lambda_{\mathbf{W}}<1$ . Then, every $f:[N]\to\mathbb{R}$ satisfies the following inequality for every $n\in\mathbb{N}$ , every $q\geqslant 2$ and every $0<a\leqslant\sqrt{q/((1-\lambda_{\mathbf{W}})n)}$ .

[TABLE]

Remark 1.

Kloeckner investigated in [Klo19] the question of obtaining concentration bounds such as (3) with the $L_{\infty}$ norm $\max_{j\in[N]}|f(j)|$ replaced by other norms of $f$ . As discussed in [Klo19, Remark 2.2], the results of [Klo19] hold in a setting that imposes structural hypotheses on the aforementioned norm of the “observable” $f$ which notably excludes its $L_{q}(\pi_{\mathbf{W}})$ norm (which appears in the right-hand side of the bound (1) that we prove here), but it is noted in [Klo19, Remark 2.2] that “classically one only makes moment assumptions on the observable.” Corollary 1.3 addresses this question, though note that [Klo19] also covers settings that are not treated here.

The new bound (1) that we obtain differs from Gillman’s estimate (2) only in the replacement of the worst-case bound on $f$ in the right-hand side of (2) by an average-case bound. Rather than being merely a quantitative enhancement, this improvement has conceptual significance which we achieve through a reasoning that differs substantially from the proof of (3) in [Gil93, Gil98], as well as the several other proofs of (3) and its variants that appeared in the literature [Din95, Kah97, Lez98, LP04, Kar07, Wag08, CLLM12, Pau15, GLSS18, FJS18, Klo19] (our approach was recently used in [RR17, Rao19]).

Assuming a bound on the $q$ ’th moment of $f$ is the appropriate setting for bounding the $q$ ’th moment of $f(W_{1})+\cdots+f(W_{n})$ . This compatibility of the left-hand side of (1) and the right-hand side of (1) allows the resulting inequality to tensorize so as to yield dimension-independent vector-valued statements. Specifically, for any measure space $(\Omega,\mu)$ , if $f:[N]\to L_{q}(\mu)$ , then by applying (1) to the real-valued mapping $(i\in[N])\mapsto f(i)(\omega)$ for each $\omega\in\Omega$ , and then integrating the ( $q$ ’th power of) the resulting point-wise inequality, we see that (under the assumptions of Theorem 1.1),

[TABLE]

The following Hilbertian statement is a consequence of (4) that deserves to be stated separately.

Corollary 1.4.

Suppose that $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ is a stationary Markov chain whose state space is $[N]$ and with $\lambda_{\mathbf{W}}<1$ . Let $(H,\|\cdot\|_{H})$ be a Hilbert space. The following bound holds for all $n\in\mathbb{N}$ , $q\geqslant 2$ and $f:[N]\to H$ .

[TABLE]

Corollary 1.4 is nothing more than (4) applied to an isometric copy of $H$ in $L_{q}(\mu)$ , which is known to exist by [Ban32, Chapter 12] (see also the exposition in, e.g., the textbook [AK16, Proposition 6.4.12]).

Since $\mathbb{E}[\|f(W_{1})\|_{H}^{q}]\leqslant\max_{j\in[N]}\|f(j)\|_{H}^{q}$ , the following corollary is a consequence Corollary 1.4 through the usual application of Markov’s inequality and then an optimization over $q$ .

Corollary 1.5 (Hilbert space-valued expander Chernoff bound).

There is a universal constant $c>0$ with the following property. Suppose that $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ is a stationary Markov chain whose state space is $[N]$ and with $\lambda_{\mathbf{W}}<1$ . Let $(H,\|\cdot\|_{H})$ be a Hilbert space. If $f:[N]\to H$ , then for all $n\in\mathbb{N}$ and $a>0$ we have

[TABLE]

Remark 2.

Kargin studied [Kar07] the vector-valued setting of Gillman’s theorem for functions that take values in the $m$ -dimensional Euclidean space $\ell_{2}^{m}$ . The statement that is obtained in [Kar07] is the same as that of Corollary 1.5, except that it is dimension-dependent; specifically, with the implicit constant in (6) growing to $\infty$ exponentially with $m$ . Thus, the main new feature of Corollary 1.5 is that it is dimension-independent. Obtaining such a bound was a main question that [Kar07] left open; see [Kar07, Section 4].

Observe that estimates such as (4) can be interpreted as bounds on the operator norm of a certain linear operator between vector-valued $L_{q}$ -spaces. Specifically, suppose that $(X,\|\cdot\|_{X})$ is a Banach space. Let $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ be a stationary Markov chain whose state space is $[N]$ and with $\lambda_{\mathbf{W}}<1$ . Denote (as before) the stationary measure of $\mathbf{W}$ by $\pi_{\mathbf{W}}$ and let the transition matrix of $\mathbf{W}$ be $A=(a_{ij})\in\mathsf{M}_{N}(\mathbb{R})$ . For each $n\in\mathbb{N}$ denote the associated probability measure on the trajectories of length $n$ by $\tau^{n}_{\mathbf{W}}:[N]^{n}\to[0,1]$ . Thus, $\tau^{n}_{\mathbf{W}}$ is the probability measure on $[N]^{n}$ that is given by $\tau^{1}_{\mathbf{W}}=\pi_{\mathbf{W}}$ if $n=1$ , and for $n\geqslant 2$ ,

[TABLE]

Define a linear operator $T_{X}:L_{q}(\pi_{\mathbf{W}};X)\to L_{q}(\tau^{n}_{\mathbf{W}};X)$ by setting for $f:[N]\to X$ ,

[TABLE]

Here, and in what follows, we are using standard notation for vector-valued Lebesgue–Bochner spaces, though throughout we will need to consider only finitely supported measures, in which case measurability issues do not need to be discussed. So, if $(S,\sigma)$ is a probability space with $|S|<\infty$ , then the Banach space $L_{q}(\sigma;X)$ is the vector space of all mapping $\psi:S\to X$ , equipped with the norm

[TABLE]

The validity of (4) under the assumptions of Theorem 1.1 is the same as the operator norm bound

[TABLE]

In the same vein, Corollary 1.4 is (under the same assumptions) the same as

[TABLE]

By Calderón’s vector-valued extension [Cal64] of the Riesz–Thorin [Rie27, Tho48] interpolation theorem (see the monograph [BL76] for background on complex interpolation; the specific statement that we are using here is a combination of Theorem 4.1.2 and Theorem 5.1.2 in [BL76]), it follows from (8) and (9) that for every $p\in[2,q]$ we have

[TABLE]

We record this conclusion as the following generalization of Corollary 1.4 and Corollary 1.5.

Corollary 1.6.

Suppose that $p\geqslant 2$ and that $(\Omega,\mu)$ is a measure space. Let $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ be a stationary Markov chain whose state space is $[N]$ and with $\lambda_{\mathbf{W}}<1$ . If $f:[N]\to L_{p}(\mu)$ , then for all $n\in\mathbb{N}$ and $q\geqslant p$ ,

[TABLE]

Consequently, by the usual combination of (10) with Markov’s inequality, followed by optimization over $q\geqslant p$ , there exists a universal constant $c\in(0,\infty)$ such that

[TABLE]

Remark 3.

By convexity we have $\|T_{\mathbb{R}}\|_{L_{1}(\pi_{\mathbf{W}})\to L_{1}(\tau_{\mathbf{W}}^{n})}\leqslant 2$ , since it is evident from (7) that the operator in question is the difference of two averaging operators. By interpolating this (trivial) estimate with the case $q=2$ of Theorem 1.1 using the (scalar-valued) Riesz–Thorin interpolation theorem as above, we arrive at the following variant of Theorem 1.1 in the range $1\leqslant q\leqslant 2$ , which holds under the same assumptions.

[TABLE]

Observe that when the Markov chain $\mathbf{W}$ is reversible, the case $q=2$ of (1) is a quadratic inequality that could be directly verified in a straightforward manner by expanding both sides in an orhtonormal eigenbasis of the transition matrix of $\mathbf{W}$ . The more substantial content of Theorem 1.1 is therefore the case $q>2$ , which does not lend itself to such linear-algebraic reasoning.**

Remark 4.

Both (1) and (12) are sharp (up to the implicit universal constant factors) for large enough $n\in\mathbb{N}$ . This is seen by examining the following family of Markov chains. For every $\varepsilon,\lambda\in(0,1)$ consider the two-state Markov chain $\mathbf{W}(\lambda,\varepsilon)$ whose transition matrix equals

[TABLE]

where $I_{2}$ is the $2$ -by- $2$ identity matrix and $\pi(\varepsilon)=(\varepsilon,1-\varepsilon)\in\triangle^{\!1}$ . Then $\pi_{\mathbf{W}(\lambda,\varepsilon)}=\pi(\varepsilon)$ and $\lambda_{\mathbf{W}(\lambda,\varepsilon)}=\lambda$ .

The optimality of (1) is exhibited by taking $\varepsilon=\frac{1}{2}$ and $f:\{1,2\}\to\mathbb{R}$ that is given by $f(1)=1=-f(2)$ . In this case, it is elementary to check that if $n\geqslant q/(1-\lambda)$ , then both sides of (1) are within universal constant multiples of each other. Next, the optimality of (12) is exhibited by considering $f:\{1,2\}\to\mathbb{R}$ that is given by $f(1)=1$ and $f(2)=0$ . In this case, it is elementary to check that if $n\geqslant 1/(1-\lambda)$ , then for small enough $\varepsilon>0$ both sides of (12) are within universal constant multiples of each other. The routine computations that verify these assertions are omitted.**

Remark 5.

The above discussion raises the question of understanding what is required from a Banach space $(X,\|\cdot\|_{X})$ so that the “Gillman phenomenon” for stationary Markov chains (or variants thereof) would hold for $X$ -valued mappings. The present work obtains the first examples (notably, Hilbert space) of such theorems in infinite dimensions (equivalently, dimension-independent bounds). However, much more remains to be understood here. This matter is pursued in the forthcoming work [Nao19], where it is explained how it relates to central themes in Banach space theory. Further infinite dimensional statements are derived in [Nao19], including a treatment of (10) in the range $2\leqslant q<p$ which is not covered in Corollary 1.6, through an approach that is entirely different from our reasoning here. **

We end the Introduction by noting that the above results have an equivalent dual formulation that is worthwhile to work out explicitly. Given a Banach space $(X,\|\cdot\|_{X})$ , the operator $T_{X}$ that is given in (7) has norm $K>0$ from $L_{q}(\pi_{\mathbf{W}};X)$ to $L_{q}(\tau^{n}_{\mathbf{W}};X)$ if and only if its adjoint $T^{*}_{X}$ has norm $K$ from $L_{q^{*}}(\tau^{n}_{\mathbf{W}};X^{*})$ to $L_{q^{*}}(\pi_{\mathbf{W}};X^{*})$ , where $q^{*}=q/(q-1)$ . This leads to the following dual formulation of Corollary 1.6, whose derivation is a mechanical unravelling of the definitions (the straightforward details are omitted).

Corollary 1.7 (adjoint of (10)).

Let $\mathbf{W}=\{W_{t}\}_{t=1}^{\infty}$ be a stationary Markov chain whose state space is $[N]$ and with $\lambda_{\mathbf{W}}<1$ . Fix $n\in\mathbb{N}$ and $p,q\in(1,2]$ with $q\leqslant p$ . For every measure space $(\Omega,\mu)$ and $F:[N]^{n}\to L_{p}(\mu)$ ,

[TABLE]

2 Proof of Theorem 1.1

Suppose from now on that we are in the setting of Theorem 1.1. We will write for simplicity $\lambda=\lambda_{\mathbf{W}}<1$ and $\pi=\pi_{\mathbf{W}}\in\triangle^{\!N-1}$ . We will also let $A=(a_{ij})\in\mathsf{M}_{N}(\mathbb{R})$ be the transition matrix of $\mathbf{W}$ .

It suffices to prove (1) when $f:[N]\to\mathbb{R}$ satisfies $\mathbb{E}[f(W_{1})]=0$ . Indeed, this could be then applied to the centered function $f-\mathbb{E}[f(W_{1})]$ to yield the estimate

[TABLE]

where the last step is the triangle inequality in $L_{q}(\pi)$ . So, assume from now on that $\mathbb{E}[f(W_{1})]=0$ . It will be convenient to define $u\in\mathbb{R}^{N}$ by setting $u_{i}=f(i)$ for all $i\in[N]$ . The assumption on $f$ becomes $\sum_{i=1}^{N}\pi_{i}u_{i}=0$ . Below, we will denote the diagonal matrix whose diagonal is $u$ by $U\in\mathsf{M}_{N}(\mathbb{R})$ , i.e.,

[TABLE]

Lemma 2.1.

For every $m\in\mathbb{N}$ we have

[TABLE]

Proof.

Let $V_{2m}$ be the set of all those vectors in $w\in[n]^{2m}$ that satisfy $1\leqslant w_{1}\leqslant w_{2}\leqslant\cdots\leqslant w_{2m}\leqslant n$ . Observe that by the Markov property and stationarity, for every $w\in V_{2m}$ we have the following identity.

[TABLE]

So, by expanding the $(2m)$ ’th power of $f(W_{1})+\cdots+f(W_{n})$ and arranging the indices in increasing order,

[TABLE]

Remark 6.

It is worthwhile to note in passing that while the proof of Lemma 2.1 relies on what may seem to be innocuous identities, the crucial step that rearranged the factors so that their indices are increasing is inherently commutative, and this is what obstructs the direct use of the ensuing proof for matrix-valued functions, namely the setting of [WX08, GLSS18]; alternative routes are taken in [GLSS18, Nao19] but it would be interesting to investigate if a more careful reasoning along the lines of the present work could be used to treat the setting of functions that take values in Schatten–von Neuman trace classes.**

Towards bounding from above each of the terms $\|UA^{v_{1}}UA^{v_{2}}\cdots UA^{v_{2m-1}}u\|_{L_{1}(\pi)}$ from Lemma 2.1, we record the following iterative application of Hölder’s inequality and the definition of operator norms.

Lemma 2.2.

Fix $k\in\mathbb{N}$ and $q\geqslant k+1$ . Then, for every $T_{1},\ldots,T_{k}\in\mathsf{M}_{N}(\mathbb{R})$ we have

[TABLE]

Proof.

Suppose that $\alpha(1),\ldots,\alpha(k+1)\geqslant 1$ satisfy $\frac{1}{\alpha(1)}+\cdots+\frac{1}{\alpha(k+1)}\leqslant 1$ . We claim that

[TABLE]

where $\beta(0),\ldots,\beta(k)\geqslant 1$ are defined by $\frac{1}{\beta(j)}=\frac{1}{\alpha(j+1)}+\cdots+\frac{1}{\alpha(k+1)}$ . The proof of (15) is by induction on $k$ .

The case $k=0$ is tautological. For the induction step, since $\frac{1}{\beta(0)}=\frac{1}{\alpha(1)}+\frac{1}{\beta(1)}$ , by Hölder’s inequality,

[TABLE]

By the definition of the operator norm $\|T_{1}\|_{L_{\beta(1)}(\pi)\to L_{\beta(1)}(\pi)}$ we have,

[TABLE]

Now (15) follows by combining (16) and (17) with the inductive hypothesis.

Choose $\alpha(1)=\alpha(k+1)=\frac{2q}{q-k+1}$ and $\alpha(2)=\cdots=\alpha(k)=q$ . So,

[TABLE]

and $\beta(0)=1$ . Hence, with this specific setting of the parameters the bound (15) becomes

[TABLE]

It remains to note that since $q\geqslant k+1$ we have $\frac{2q}{q-k+1}\leqslant q$ , and therefore $\|u\|_{L_{\frac{2q}{q-k+1}}(\pi)}\leqslant\|u\|_{L_{q}(\pi)}$ . ∎

Fix $m\in\mathbb{N}$ . Throughout what follows, it will be notationally convenient to consider each Boolean vector $s\in\{0,1\}^{2m-1}$ as an infinite vector in $\{0,1\}^{\mathbb{Z}}$ whose entries vanish on $\mathbb{Z}\smallsetminus[2m-1]$ , namely we use the convention $s_{i}=s_{j}=0$ for $i\leqslant 0$ and $j\geqslant 2m$ . Let $S_{2m-1}\subseteq\{0,1\}^{2m-1}$ be all those Boolean vectors of length $2m-1$ with no two consecutive [math]s, and with $s_{2m-1}=1$ , i.e.,

[TABLE]

For each $j\in[2m-1]$ and $s\in S_{2m-1}$ that satisfy $s_{j}=1$ , we define a quantity $p(s,j)\geqslant 1$ in the following way. Consider the consecutive run of $1$ s in $s$ to which $j$ belongs, and let $i_{1}(s,j)$ and $i_{2}(s,j)$ be the first and last indices of this run, respectively. Formally,

[TABLE]

With this notation, write

[TABLE]

Lemma 2.3.

For every $T_{1},\ldots,T_{2m-1}\in\mathsf{M}_{N}(\mathbb{R})$ ,

[TABLE]

Proof.

For each $j\in[2m-1]$ , write $T_{j,0}=E_{\pi}$ and $T_{j,1}=T_{j}$ . Observe that

[TABLE]

Indeed, if $s\in\{0,1\}^{2m-1}\smallsetminus S_{2m-1}$ , then either $s_{2m-1}=0$ , in which case $T_{2m-1,s_{2m-1}}u=E_{\pi}u=\mathbf{0}\in\mathbb{R}^{N}$ , or $s_{j}=s_{j+1}=0$ for some $j\in[2m-2]$ , in which case $T_{j,s_{j}}UT_{j+1,s_{j+1}}=E_{\pi}UE_{\pi}=\mathbf{0}\in\mathsf{M}_{N}(\mathbb{R})$ , where both identities are equivalent to the assumption $\sum_{i=1}^{N}\pi_{i}u_{i}=0$ . Now,

[TABLE]

Fix $s\in S_{2m-1}$ and let $1\leqslant r_{1}<r_{2}<\cdots<r_{\ell}<2m-1$ be all of the indices at which $s$ vanishes. Define $R_{1},\ldots,R_{\ell+1}\in\mathsf{M}_{N}(\mathbb{R})$ by setting

[TABLE]

and

[TABLE]

for $\kappa\in\{2,\ldots,\ell\}$ . Using the fact that $UE_{\pi}v=\left(\sum_{i=1}^{N}\pi_{i}v_{i}\right)u$ for every $v\in\mathbb{R}^{N}$ , we have the following identity.

[TABLE]

Consequently,

[TABLE]

Next, by Lemma 2.2 with $q=2m$ and $k=r_{1}-1$ we have

[TABLE]

In the same vein, for every $k\in\{2,\ldots,\ell\}$ ,

[TABLE]

and also

[TABLE]

We therefore have

[TABLE]

By substituting (24) into (23) and then substituting the resulting estimate into (22), we arrive at (20). ∎

In light of Lemma 2.1, the following lemma is highly relevant to our goal of proving Theorem 1.1.

Lemma 2.4.

Suppose that $m\in\mathbb{N}$ satisfies $em\leqslant n(1-\lambda)$ . Then,

[TABLE]

Proof.

Fix $v_{0},\ldots,v_{2m-1}\in\mathbb{N}\cup\{0\}$ and denote $T_{j}=A^{v_{j}}-E_{\pi}$ for every $j\in\{0,\ldots,2m-1\}$ . Then,

[TABLE]

where the last step of (26) is an application of Lemma 2.3.

Fixing $j\in\{0,\ldots,2m-1\}$ , note that $AE_{\pi}=E_{\pi}$ since $A$ is stochastic and the columns of $E_{\pi}$ are constant, and also $E_{\pi}A=E_{\pi}$ since $A$ is $\pi$ -stationary. Consequently $T_{j}=A^{v_{j}}-E_{\pi}=(A-E_{\pi})^{v_{j}}$ . So, for every $p\geqslant 1$ ,

[TABLE]

By definition, $\|A-E_{\pi}\|_{L_{2}(\pi)\to L_{2}(\pi)}=\lambda$ . As $A$ and $E_{\pi}$ are averaging operators, by convexity and the triangle inequality $\|A-E_{\pi}\|_{L_{r}(\pi)\to L_{r}(\pi)}\leqslant\|A\|_{L_{r}(\pi)\to L_{r}(\pi)}+\|E_{\pi}\|_{L_{r}(\pi)\to L_{r}(\pi)}=2$ for all $r\geqslant 1$ . By the Riesz–Thorin interpolation theorem [Rie27, Tho48] (see e.g. Chapter IV in the textbook [Kat04]), this implies that

[TABLE]

A substitution of (28) into (27), followed by a substitution of the resulting bound into (26) shows that in order to prove the desired inequality (25) it suffices to establish the following estimate.

[TABLE]

where for every $s\in S_{2m-1}$ and $j\in[2m-1]$ such that $s_{j}=1$ , we denote

[TABLE]

Fix some $s\in S_{2m-1}$ . Denote $Q_{0}=\{j\in[2m-1]:s_{j}=0\}$ and $Q_{1}=[2m-1]\smallsetminus Q_{0}$ . Thus $|Q_{0}|+|Q_{1}|=2m-1$ and by the definition of $S_{2m-1}$ we have $|Q_{1}|\geqslant m$ . With this notation, we have the following bound.

[TABLE]

By the elementary inequality $1-\lambda^{\beta}\geqslant\beta(1-\lambda)$ , which holds for every $\lambda,\beta\in[0,1]$ , it follows from this that

[TABLE]

where the last step follows from a straightforward application of Stirling’s formula. Consider the function $\psi:[0,\infty)\to[0,\infty)$ that is given by $\psi(z)=((1-\lambda)n/z)^{z}$ . Then $(\log\psi(z))^{\prime}=\log((1-\lambda)n/(ez))$ . Hence, $\psi$ is increasing on the interval $[0,(1-\lambda)n/e]$ . But $|Q_{0}|+1=2m-|Q_{1}|\leqslant m\leqslant(1-\lambda)n/e$ , by the assumption on $m$ in the statement of Lemma 2.4. Hence $\psi(|Q_{0}|+1)\leqslant\psi(m)$ , and therefore

[TABLE]

We will show next that

[TABLE]

In combination with (31) and (32), this would imply the desired inequality (29) because $|S_{2m-1}|\leqslant e^{O(m)}$ .

For each $j\in Q_{1}$ with $i_{2}(s,j)-i_{1}(s,j)\leqslant\frac{3m}{2}$ (i.e., the consecutive run of $1$ s in $s$ to which $j$ belongs is of length at most $1+\frac{3m}{2}$ ), we have $|i_{1}(s,j)+i_{2}(s,j)-2j|\leqslant\frac{3m}{2}$ and therefore its contribution to the product in (33) is at most $4$ . So, (33) holds if there are no runs of $1$ s in $s$ of length greater than $\frac{3m}{2}$ . Otherwise, there is exactly one run of $1$ s in $s$ of length $d>\frac{3m}{2}$ , and its contribution to the product in (33) equals

[TABLE]

where the last step follows from Stirling’s formula. This proves our goal (33). ∎

Completion of the proof of Theorem 1.1.

By the triangle inequality in $L_{q}$ (and stationarity) we have

[TABLE]

This bound implies the desired estimate (1) when $q\gtrsim(1-\lambda)n$ , so we may assume from now on that $q\leqslant(1-\lambda)n/e$ . Let $m\in\mathbb{N}$ be the largest integer such that $2m\leqslant q$ . Then, $m,m+1\leqslant q\leqslant(1-\lambda)n/e$ , so the conclusion of Lemma 2.4 holds for both $m$ and $m+1$ . By Lemma 2.1 (and Stirling’s formula), this gives

[TABLE]

and similarly

[TABLE]

As in (14), it follows from these bounds (which we derived under the assumption $\mathbb{E}[f(W_{1})]=0$ ) that the norm of the operator $T_{\mathbb{R}}$ that is given in (7) is bounded by a universal constant multiple of $\sqrt{q/((1-\lambda)n)}$ both from $L_{2m}(\pi)$ to $L_{2m}(\pi)$ and from $L_{2(m+1)}(\pi)$ to $L_{2(m+1)}(\pi)$ . Since $2m\leqslant q\leqslant 2(m+1)$ , another application of the Riesz–Thorin theorem gives that the norm of $T_{\mathbb{R}}$ from $L_{q}(\pi)$ to $L_{q}(\pi)$ is also bounded by a universal constant multiple of $\sqrt{q/((1-\lambda)n)}$ . This is precisely the desired bound (1). ∎

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AK 16] F. Albiac and N. J. Kalton. Topics in Banach space theory , volume 233 of Graduate Texts in Mathematics . Springer, [Cham], second edition, 2016. With a foreword by Gilles Godefory.
2[Ban 32] S. Banach. Théorie des opérations linéaires. , volume 1. PWN - Panstwowe Wydawnictwo Naukowe, Warszawa, 1932.
3[BL 76] J. Bergh and J. Löfström. Interpolation spaces. An introduction . Springer-Verlag, Berlin-New York, 1976. Grundlehren der Mathematischen Wissenschaften, No. 223.
4[Cal 64] A.-P. Calderón. Intermediate spaces and interpolation, the complex method. Studia Math. , 24:113–190, 1964.
5[CLLM 12] K. Chung, H. Lam, Z. Liu, and M. Mitzenmacher. Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In STACS , pages 124–135. 2012. ar Xiv:1201.0559 .
6[Din 95] I. H. Dinwoodie. A probability inequality for the occupation measure of a reversible Markov chain. Ann. Appl. Probab. , 5(1):37–43, 1995.
7[FJS 18] J. Fan, B. Jiang, and Q. Sun. Hoeffding’s lemma for Markov chains and its applications to statistical learning, 2018.
8[Gil 93] D. Gillman. A Chernoff bound for random walks on expander graphs. In 34th Annual Symposium on Foundations of Computer Science (Palo Alto, CA, 1993) , pages 680–691. IEEE Comput. Soc. Press, Los Alamitos, CA, 1993. doi: 10.1109/SFCS.1993.366819 .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Concentration of Markov chains with bounded moments

Abstract

1 Introduction

Theorem 1.1**.**

Theorem 1.2**.**

Corollary 1.3**.**

Remark 1**.**

Corollary 1.4**.**

Corollary 1.5** (Hilbert space-valued expander Chernoff bound).**

Remark 2**.**

Corollary 1.6**.**

Remark 3**.**

Remark 4**.**

Remark 5**.**

Corollary 1.7** (adjoint of (10)).**

2 Proof of Theorem 1.1

Lemma 2.1**.**

Proof.

Remark 6**.**

Lemma 2.2**.**

Proof.

Lemma 2.3**.**

Proof.

Lemma 2.4**.**

Proof.

Completion of the proof of Theorem 1.1.

Theorem 1.1.

Theorem 1.2.

Corollary 1.3.

Remark 1.

Corollary 1.4.

Corollary 1.5 (Hilbert space-valued expander Chernoff bound).

Remark 2.

Corollary 1.6.

Remark 3.

Remark 4.

Remark 5.

Corollary 1.7 (adjoint of (10)).

Lemma 2.1.

Remark 6.

Lemma 2.2.

Lemma 2.3.

Lemma 2.4.