A quantitative Mc Diarmid's inequality for geometrically ergodic Markov   chains

Antoine Havet; Matthieu Lerasle; Eric Moulines; Elodie Vernet

arXiv:1907.02809·math.ST·July 8, 2019

A quantitative Mc Diarmid's inequality for geometrically ergodic Markov chains

Antoine Havet, Matthieu Lerasle, Eric Moulines, Elodie Vernet

PDF

Open Access

TL;DR

This paper develops a quantitative version of Mc Diarmid's inequality tailored for geometrically ergodic Markov chains, enhancing the understanding of concentration inequalities in dependent stochastic processes.

Contribution

It introduces a modified coupling argument to extend the bounded difference inequality to all geometrically ergodic Markov chains, filling a gap in existing methods.

Findings

01

Provides a new quantitative bound for Markov chains

02

Extends Mc Diarmid's inequality to a broader class of chains

03

Improves the theoretical tools for analyzing dependent data

Abstract

We state and prove a quantitative version of the bounded difference inequality for geometrically ergodic Markov chains. Our proof uses the same martingale decomposition as \cite{MR3407208} but, compared to this paper, the exact coupling argument is modified to fill a gap between the strongly aperiodic case and the general aperiodic case.

Equations133

∣ f (x) - f (y) ∣ ⩽ i = 0 \sum n - 1 c_{i} \mathbbm 1_{{x_{i} \neq = y_{i}}} .

∣ f (x) - f (y) ∣ ⩽ i = 0 \sum n - 1 c_{i} \mathbbm 1_{{x_{i} \neq = y_{i}}} .

\mathbb{P}\big{(}f(X_{0},\ldots,X_{n-1})-\mathbb{E}[f(X_{0},\ldots,X_{n-1})]>t\big{)}\leqslant\mathrm{e}^{-2t^{2}/\|c\|^{2}}\;,

\mathbb{P}\big{(}f(X_{0},\ldots,X_{n-1})-\mathbb{E}[f(X_{0},\ldots,X_{n-1})]>t\big{)}\leqslant\mathrm{e}^{-2t^{2}/\|c\|^{2}}\;,

\mathbb{P}_{x}\big{(}f(X_{0},\ldots,X_{n-1})-\mathbb{E}_{x}[f(X_{0},\ldots,X_{n-1})]>t\big{)}\leqslant\mathrm{e}^{-\beta t^{2}/\|c\|^{2}}\;,

\mathbb{P}_{x}\big{(}f(X_{0},\ldots,X_{n-1})-\mathbb{E}_{x}[f(X_{0},\ldots,X_{n-1})]>t\big{)}\leqslant\mathrm{e}^{-\beta t^{2}/\|c\|^{2}}\;,

τ_{B}^{i} = in f {n ⩾ i : X_{n} \in B} = i + τ_{B}^{0} \circ θ^{i} and σ_{B} = τ_{B}^{1} = 1 + τ_{B}^{0} \circ θ .

τ_{B}^{i} = in f {n ⩾ i : X_{n} \in B} = i + τ_{B}^{0} \circ θ^{i} and σ_{B} = τ_{B}^{1} = 1 + τ_{B}^{0} \circ θ .

x \in C sup E_{x} [u^{σ_{C}}] ⩽ M .

x \in C sup E_{x} [u^{σ_{C}}] ⩽ M .

d_{TV} (δ_{x} P^{n}, π) ⩽ L r^{n},

d_{TV} (δ_{x} P^{n}, π) ⩽ L r^{n},

∣ E_{ξ} [h (X_{0}^{n - 1})] - E_{ξ^{'}} [h (X_{0}^{n - 1})] ∣ ⩽ 2 i = 0 \sum n - 1 c_{i} d_{TV} (ξ P^{i}, ξ^{'} P^{i}) .

∣ E_{ξ} [h (X_{0}^{n - 1})] - E_{ξ^{'}} [h (X_{0}^{n - 1})] ∣ ⩽ 2 i = 0 \sum n - 1 c_{i} d_{TV} (ξ P^{i}, ξ^{'} P^{i}) .

h (x_{0}^{n - 1}) = i = 0 \sum n - 1 {\overset{ˉ}{h}_{i} (x_{i}^{n - 1}) - \overset{ˉ}{h}_{i + 1} (x_{i + 1}^{n - 1})} + \overset{ˉ}{h}_{n} .

h (x_{0}^{n - 1}) = i = 0 \sum n - 1 {\overset{ˉ}{h}_{i} (x_{i}^{n - 1}) - \overset{ˉ}{h}_{i + 1} (x_{i + 1}^{n - 1})} + \overset{ˉ}{h}_{n} .

\overset{w}{ˉ}_{i} (x_{i})

\overset{w}{ˉ}_{i} (x_{i})

= \int {h (x^{*}, \dots, x^{*}, x_{i}^{n - 1}) - h (x^{*}, \dots, x^{*}, x_{i + 1}^{n - 1})} ℓ = i + 1 \prod n - 1 P (x_{ℓ - 1}, d x_{ℓ}) .

E_{ξ} [h (X_{0}^{n - 1})] = i = 0 \sum n - 1 ξ P^{i} \overset{w}{ˉ}_{i} + \overset{ˉ}{h}_{n} .

E_{ξ} [h (X_{0}^{n - 1})] = i = 0 \sum n - 1 ξ P^{i} \overset{w}{ˉ}_{i} + \overset{ˉ}{h}_{n} .

∣ E_{ξ} [h (X^{n - 1})] - E_{ξ^{'}} [h (X^{n - 1})] ∣ \leq i = 0 \sum n - 1 ∣ ξ P^{i} \overset{w}{ˉ}_{i} - ξ^{'} P^{i} \overset{w}{ˉ}_{i} ∣ \leq 2 i = 0 \sum n - 1 c_{i} d_{TV} (ξ P^{i}, ξ^{'} P^{i}) .

∣ E_{ξ} [h (X^{n - 1})] - E_{ξ^{'}} [h (X^{n - 1})] ∣ \leq i = 0 \sum n - 1 ∣ ξ P^{i} \overset{w}{ˉ}_{i} - ξ^{'} P^{i} \overset{w}{ˉ}_{i} ∣ \leq 2 i = 0 \sum n - 1 c_{i} d_{TV} (ξ P^{i}, ξ^{'} P^{i}) .

\mathbb{P}_{x}\big{(}f({X}_{0}^{n-1})-\mathbb{E}_{x}[f({X}_{0}^{n-1})]>t\big{)}\leqslant\exp\bigg{(}-\frac{\beta t^{2}}{\|c\|^{2}}\bigg{)}\;,

\mathbb{P}_{x}\big{(}f({X}_{0}^{n-1})-\mathbb{E}_{x}[f({X}_{0}^{n-1})]>t\big{)}\leqslant\exp\bigg{(}-\frac{\beta t^{2}}{\|c\|^{2}}\bigg{)}\;,

\beta=\frac{(1-r\vee u^{-1/4})^{2}}{16L}\bigg{(}\frac{5}{\log u}+4ML\bigg{)}^{-1}\;.

\beta=\frac{(1-r\vee u^{-1/4})^{2}}{16L}\bigg{(}\frac{5}{\log u}+4ML\bigg{)}^{-1}\;.

G_{i}=\mathbb{E}_{x}\big{[}f({X}_{0}^{n-1})|\mathscr{F}_{\tau_{\mathsf{C}}^{i}}\big{]}\;.

G_{i}=\mathbb{E}_{x}\big{[}f({X}_{0}^{n-1})|\mathscr{F}_{\tau_{\mathsf{C}}^{i}}\big{]}\;.

f (X_{0}^{n - 1}) - E_{x} [f (X_{0}^{n - 1})] = G_{n - 1} - G_{0} = i = 0 \sum n - 2 (G_{i + 1} - G_{i}) .

f (X_{0}^{n - 1}) - E_{x} [f (X_{0}^{n - 1})] = G_{n - 1} - G_{0} = i = 0 \sum n - 2 (G_{i + 1} - G_{i}) .

G_{i} - G_{i - 1} = (G_{i} - G_{i - 1}) \mathbbm 1_{{τ_{C}^{i - 1} = i - 1}} .

G_{i} - G_{i - 1} = (G_{i} - G_{i - 1}) \mathbbm 1_{{τ_{C}^{i - 1} = i - 1}} .

G_{i} - G_{i - 1} = (G_{i} - G_{i - 1}) (\mathbbm 1_{{τ_{C}^{i - 1} = i - 1}} + \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}}) .

G_{i} - G_{i - 1} = (G_{i} - G_{i - 1}) (\mathbbm 1_{{τ_{C}^{i - 1} = i - 1}} + \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}}) .

(G_{i} - G_{i - 1}) \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}} = j ⩾ i \sum (G_{i} - G_{i - 1}) \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i} = j}} .

(G_{i} - G_{i - 1}) \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}} = j ⩾ i \sum (G_{i} - G_{i - 1}) \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i} = j}} .

\displaystyle G_{i}\mathbbm{1}_{\{\tau_{\mathsf{C}}^{i}=j\}}=\begin{cases}\mathbb{E}_{x}\big{[}f({X}_{0}^{n-1})|\mathscr{F}_{j}\big{]}&\text{ if }j\leqslant n-2\;,\\ f({X}_{0}^{n-1})&\text{ if }j\geqslant n-1\;.\end{cases}

\displaystyle G_{i}\mathbbm{1}_{\{\tau_{\mathsf{C}}^{i}=j\}}=\begin{cases}\mathbb{E}_{x}\big{[}f({X}_{0}^{n-1})|\mathscr{F}_{j}\big{]}&\text{ if }j\leqslant n-2\;,\\ f({X}_{0}^{n-1})&\text{ if }j\geqslant n-1\;.\end{cases}

G_{i} \mathbbm 1_{{τ_{C}^{i} = j}} \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}} = G_{i - 1} \mathbbm 1_{{τ_{C}^{i - 1} = j}} \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}} = G_{i - 1} \mathbbm 1_{{τ_{C}^{i} = j}} \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}} .

G_{i} \mathbbm 1_{{τ_{C}^{i} = j}} \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}} = G_{i - 1} \mathbbm 1_{{τ_{C}^{i - 1} = j}} \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}} = G_{i - 1} \mathbbm 1_{{τ_{C}^{i} = j}} \mathbbm 1_{{τ_{C}^{i - 1} = τ_{C}^{i}}} .

g_{i} (x_{0}^{i}) = E_{x_{i}} [f (x_{0}^{i}, X_{1}^{n - 1 - i})], g_{i, π} (x_{0}^{i}) = E_{π} [f (x_{0}^{i}, X_{1}^{n - 1 - i})] .

g_{i} (x_{0}^{i}) = E_{x_{i}} [f (x_{0}^{i}, X_{1}^{n - 1 - i})], g_{i, π} (x_{0}^{i}) = E_{π} [f (x_{0}^{i}, X_{1}^{n - 1 - i})] .

∣ g_{i} (x_{0}^{i}) - g_{i, π} (x_{0}^{i}) ∣ ⩽ 2 L j = i + 1 \sum n - 1 c_{j} r^{j - i} .

∣ g_{i} (x_{0}^{i}) - g_{i, π} (x_{0}^{i}) ∣ ⩽ 2 L j = i + 1 \sum n - 1 c_{j} r^{j - i} .

∣ f_{i} (y_{1}^{n - 1 - i}) - f_{i} (z_{1}^{n - 1 - i}) ∣ ⩽ k = 1 \sum n - 1 - i c_{i + k} \mathbbm 1_{{y_{k} \neq = z_{k}}} .

∣ f_{i} (y_{1}^{n - 1 - i}) - f_{i} (z_{1}^{n - 1 - i}) ∣ ⩽ k = 1 \sum n - 1 - i c_{i + k} \mathbbm 1_{{y_{k} \neq = z_{k}}} .

∣ g_{i} (x_{0}^{i}) - g_{i, π} (x_{0}^{i}) ∣

∣ g_{i} (x_{0}^{i}) - g_{i, π} (x_{0}^{i}) ∣

= ∣ E_{x_{i}} [f_{i} (X_{1}^{n - 1 - i})] - E_{π} [f_{i} (X_{1}^{n - 1 - i})] ∣ ⩽ 2 j = i + 1 \sum n - 1 c_{j} d_{TV} (δ_{x_{i}} P^{j}, π) .

∣ G_{i} - G_{i - 1} ∣

∣ G_{i} - G_{i - 1} ∣

∣ G_{i} - G_{i - 1} ∣^{2}

G_{i, 1} = E_{x} [f (X_{0}^{n - 1}) ∣ F_{τ_{C}^{i - 1}}] \mathbbm 1_{{τ_{C}^{i - 1} = i - 1}}, G_{i, 2} = E_{x} [f (X_{0}^{n - 1}) ∣ F_{τ_{C}^{i}}] \mathbbm 1_{{τ_{C}^{i - 1} = i - 1}} .

G_{i, 1} = E_{x} [f (X_{0}^{n - 1}) ∣ F_{τ_{C}^{i - 1}}] \mathbbm 1_{{τ_{C}^{i - 1} = i - 1}}, G_{i, 2} = E_{x} [f (X_{0}^{n - 1}) ∣ F_{τ_{C}^{i}}] \mathbbm 1_{{τ_{C}^{i - 1} = i - 1}} .

E_{x} [f (X_{0}^{n - 1}) ∣ F_{i}] = g_{i} (X_{0 : i}), P_{x} - a.s. .

E_{x} [f (X_{0}^{n - 1}) ∣ F_{i}] = g_{i} (X_{0 : i}), P_{x} - a.s. .

G_{i, 1}

G_{i, 1}

= g_{i - 1} (X_{0}^{i - 1}) \mathbbm 1_{{τ_{C}^{i - 1} = i - 1}} = g_{i - 1, π} (X_{0}^{i - 1}) \mathbbm 1_{{τ_{C}^{i - 1} = i - 1}} + R_{i, 1} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Stochastic processes and statistical mechanics · Graph theory and applications

Full text

\newaliascnt

lemmatheorem \aliascntresetthelemma

\newaliascntcorollarytheorem \aliascntresetthecorollary

\newaliascntpropositiontheorem \aliascntresettheproposition

\newaliascntdefinitiontheorem \aliascntresetthedefinition

\newaliascntdefinitionPropositiontheorem \aliascntresetthedefinitionProposition

\newaliascntremarktheorem \aliascntresettheremark

A quantitative Mc Diarmid’s inequality for geometrically ergodic Markov chains

A. Havet, M. Lerasle, E. Moulines and E. Vernet

Abstract

We state and prove a quantitative version of the bounded difference inequality for geometrically ergodic Markov chains. Our proof uses the same martingale decomposition as [2] but, compared to this paper, the exact coupling argument is modified to fill a gap between the strongly aperiodic case and the general aperiodic case.

Keywords: Concentration inequalities ; Markov chains ; Geometric ergodicity ; Coupling.

AMS MSC 2010: 60J05; 60E15.

1 Introduction

The purpose of this note is to establish a quantitative version of Mc Diarmid’s inequality for geometrically ergodic Markov chains. Let $X_{0},\ldots,X_{n-1}$ denote independent random variables taking values in a measurable space $(\mathsf{X},\mathscr{X})$ and $c=(c_{0},\ldots,c_{n-1})$ denote a vector of non-negative real numbers. A function $f:\mathsf{X}^{n}\to\mathbb{R}$ satisfies the bounded difference inequality if for all $x=(x_{0},\ldots,x_{n-1})$ and $y=(y_{0},\ldots,y_{n-1})\in\mathsf{X}^{n}$ , we have

[TABLE]

The bounded difference inequality, first established in [6], shows that for all $t>0$ ,

[TABLE]

where $\|c\|^{2}=\sum_{i=0}^{n-1}c_{i}^{2}$ . Several attempts have been made to extend this result to Markov chains. In [1], the concentration of particular functionals of the form $f(x_{0},\ldots,x_{n-1})=\sup_{g\in\mathscr{F}}\sum_{i=0}^{n-1}g(x_{i})$ , for centered functions $g$ in a class $\mathscr{F}$ is established. The concentration of general functionals (satisfying (1)) of geometrically ergodic Markov chains was established in [2], where it is also proved that geometric ergodicity is a necessary assumption. However, the result in [2] is not quantitative. It states that for all geometrically recurrent set $C$ , there exists a constant $\beta$ , depending on $C$ such that for all $x\in C$ and $t>0$ ,

[TABLE]

where for any $x\in\mathscr{X}$ , $\mathbb{P}_{x}$ is the distribution of the Markov chain $\{X_{k}\}_{k=0}^{\infty}$ starting from $x$ (see the precise definition below). In many applications, it is necessary to get the explicit dependence of the constant $\beta$ as a function of the set $C$ . In particular, this problem arises when establishing posterior concentration rates of Bayesian non-parametric estimators; see for example [9, 4] for recent accounts on this theory. To extend these results to Markovian settings, the result of [2] cannot be applied directly and a quantitative version of (2) is required, where the dependence of $\beta$ on constants characterizing the mixing of the Markov chain is needed; see for example [10, 5].

A quantitative version of Mc Diarmid’s inequality for Markov chains was established in [7], where the constant $\beta$ depends here explicitly on the mixing time of the chain. The existence of finite mixing times requires uniform ergodicity of the chain, see for example [8, Section 3.3], an assumption that typically fails when the chain takes value in general state spaces. In this note, we prove an extension of Mc Diarmid’s inequality to geometrically ergodic Markov chains. Our proof is based on [2], but avoids the use of [2, Lemma 6] which requires the construction of an exact coupling. Exact coupling can actually be built in the strongly aperiodic case but there is a gap in the general aperiodic case.

The remaining of the paper is decomposed as follows, Section 2 introduces formally the notations and the assumptions of the main result, which is stated and proved in Section 3.

2 Notations and assumptions

Let $(\mathsf{X},\mathscr{X})$ be a measurable space. We denote by $\mathrm{d}_{\mathrm{TV}}$ the total variation distance between probability measures. For any sequence $x=\{x_{n},\;n\in\mathbb{N}\}$ and any non-negative integers $a$ and $b$ , with $a\leqslant b$ , let ${x}_{a}^{b}=(x_{a},x_{a+1},\ldots,x_{b})$ . For any $n\geqslant 0$ and any vector $c={c}_{0}^{n-1}\in\mathbb{R}^{n}$ , let $\|c\|$ denote the Euclidean norm of $c$ and $\|c\|_{\infty}=\max_{0\leqslant i\leqslant n-1}|c_{i}|$ denote its sup-norm.

We denote by $(\mathsf{X}^{\mathbb{Z}_{+}},\mathscr{X}^{\otimes\mathbb{Z}_{+}},(\mathscr{F}_{k})_{k\geqslant 0})$ the canonical filtered space, $\{X_{n}\}_{n=0}^{\infty}$ the canonical process and $\theta:\mathsf{X}^{\mathbb{Z}_{+}}\to\mathsf{X}^{\mathbb{Z}_{+}}$ the shift operator on the canonical space defined, for any $x=(x_{n})_{n\geqslant 0}\in\mathsf{X}^{\mathbb{Z}_{+}}$ by $\theta(x)\in\mathsf{X}^{\mathbb{Z}_{+}}$ , where, for any $n\geqslant 0$ , $\theta(x)_{n}=x_{n+1}$ . Set $\theta_{1}=\theta$ and for $n\in\mathbb{N}^{*}$ , define inductively, $\theta_{n}=\theta_{n-1}\circ\theta$ . We also need to define $\theta_{\infty}$ . To this aim, fix an arbitrary $x^{*}\in\mathsf{X}$ , we define $\theta_{\infty}:\mathsf{X}^{\mathbb{N}}\to\mathsf{X}^{\mathbb{N}}$ such that for $z=\{z_{k},\;k\in\mathbb{N}\}\in\mathsf{X}^{\mathbb{N}}$ , $\theta_{\infty}z\in\mathsf{X}^{\mathbb{N}}$ is the constant sequence $(\theta_{\infty}z)_{k}=x^{*}$ for all $k\in\mathbb{N}$ .

Let $P$ be a Markov kernel on $\mathsf{X}\times\mathscr{X}$ . For any probability measure $\xi$ on $(\mathsf{X},\mathscr{X})$ , denote by $\mathbb{P}_{\xi}$ the unique probability under which $(X_{n})_{n\geqslant 0}$ is a Markov chain with Markov kernel $P$ and initial distribution $\xi$ and let $\mathbb{E}_{\xi}$ denote the expectation under the distribution $\mathbb{P}_{\xi}$ . Recall that $\mathscr{F}_{n}$ denotes the $\sigma$ -algebra generated by $X_{0},\ldots,X_{n}$ . For any $x\in\mathsf{X}$ , let $\delta_{x}$ denote the Dirac mass at point $x$ . With some abuse of notation, we also denote $\mathbb{P}_{x}$ (resp. $\mathbb{E}_{x}$ ) instead of $\mathbb{P}_{\delta_{x}}$ (resp. $\mathbb{E}_{\delta_{x}}$ ).

For any $\mathsf{B}\in\mathscr{X}$ and any integer $i\geqslant 0$ , let

[TABLE]

For $c={c}_{0}^{n-1}\in\mathbb{R}_{+}^{n}$ , we denote by $\mathbb{BD}(\mathsf{X}^{n},c)$ the set of measurable functions $f:\mathsf{X}^{n}\to\mathbb{R}$ such that for all $x=(x_{0},\dots,x_{n-1})$ and $y=(y_{0},\dots,y_{n-1})$ , $|f(x)-f(y)|\leqslant\sum_{i=0}^{n-1}c_{i}\mathbbm{1}_{\{x_{i}\neq y_{i}\}}$ The main result is established under the following conditions.

H1

The Markov kernel $P$ is irreducible and aperiodic, with unique invariant probability $\pi$ .

H2

There exist a non-empty set $\mathsf{C}\in\mathscr{X}$ and two real numbers $u>1$ and $M>0$ such that

[TABLE]

H3

There exist $r\in(0,1)$ and $L\geqslant 1$ such that, for any $x$ in the set $\mathsf{C}$ of H2 and any $n\geqslant 0$ ,

[TABLE]

where $\pi$ is the unique invariant measure granted in H1.

When the Markov kernel $P$ is uniformly ergodic, then H3 holds with $\mathsf{C}=\mathsf{X}$ . The following Lemma is a coupling result that replaces [2, Lemma 6]. It is instrumental in the sequel.

Lemma \thelemma.

For any probability measures $\xi$ and $\xi^{\prime}$ on $(\mathsf{X},\mathscr{X})$ , any $n\geqslant 1$ , any $c\in\mathbb{R}_{+}^{n}$ and any $h\in\mathbb{B}\mathbb{D}(\mathsf{X}^{n},c)$ ,

[TABLE]

Remark \theremark.

It is possible to avoid the factor $2$ in (\thelemma) under additional technical conditions, for example, when there exists a maximal coupling for $(\mathbb{P}_{\xi},\mathbb{P}_{\xi^{\prime}})$ , see [3, Lemma 23.2.1].

Proof.

Fix an arbitrary $x^{*}\in\mathsf{X}$ . For $i\in\{1,\ldots,n-1\}$ , we set $\bar{h}_{i}({x}_{i}^{n-1})=h(x^{*},\dots,x^{*},{x}_{i}^{n-1})$ . By convention, we set $\bar{h}_{n}$ the constant function $\bar{h}_{n}=h(x^{*},\ldots,x^{*})$ and $\bar{h}_{0}=h$ . With these notations, we have the decomposition

[TABLE]

For all $i\in\{0,\ldots,n-1\}$ and all $x_{i}\in\mathsf{X}$ , let

[TABLE]

It is easily seen that ${\mathbb{E}}\left[\left.\{\bar{h}_{i}({X}_{i}^{n-1})-\bar{h}_{i+1}({X}_{i+1}^{n-1})\}\,\right|\mathscr{F}_{i}\right]=\bar{w}_{i}(X_{i})$ , $\mathbb{P}_{\xi}-a.s.$ , which implies that

[TABLE]

Since $h\in\mathbb{BD}(\mathsf{X}^{n},c)$ , (3) shows that $|\bar{w}_{i}|_{\infty}\leq c_{i}$ . Therefore,

[TABLE]

∎

3 Main result

The main result of this paper is the following quantitative version of Mac Diarmid’s inequality for geometrically ergodic Markov chains.

Theorem 1.

Assume H1, H2, H3. Let $n\geqslant 1$ , $c\in\mathbb{R}^{n}$ and $f\in\mathbb{BD}(\mathsf{X}^{n},c)$ . Then, for all $x\in\mathsf{C}$ and $t>0$ ,

[TABLE]

where $\beta$ is given by

[TABLE]

Proof of Theorem 1.

Fix $c\in\mathbb{R}^{n}$ , $x\in\mathsf{X}$ and $f\in\mathbb{BD}(\mathsf{X}^{n},c)$ . Following [2], we decompose $f({X}_{0}^{n-1})-\mathbb{E}_{x}[f({X}_{0}^{n-1})]$ into martingale increments by conditioning to the stopping times $\tau_{\mathsf{C}}^{i}$ , $i=0,\ldots,n-1$ . For any integer $i\in[0,n-1]$ , define

[TABLE]

As $\tau_{\mathsf{C}}^{0}=0$ $\mathbb{P}_{x}$ -a.s., it holds $\mathbb{E}_{x}[f({X}_{0}^{n-1})]=\mathbb{E}_{x}[f({X}_{0}^{n-1})|\mathscr{F}_{\tau_{\mathsf{C}}^{0}}]=G_{0}$ . Moreover, as $\tau_{\mathsf{C}}^{n-1}\geqslant n-1$ , it also holds $G_{n-1}=\mathbb{E}_{x}[f({X}_{0}^{n-1})|\mathscr{F}_{\tau_{\mathsf{C}}^{n-1}}]=f({X}_{0}^{n-1})$ . Therefore, the difference $f({X}_{0}^{n-1})-\mathbb{E}_{x}[f({X}_{0}^{n-1})]$ is decomposed into a sum of the martingale increments $G_{i+1}-G_{i}$ as follows

[TABLE]

The proof is now decomposed into three facts that aim at bounding the Laplace transform of $f({X}_{0}^{n-1})-\mathbb{E}_{x}[f({X}_{0}^{n-1})]$ .

Fact 1. For any $i\in\{1,\ldots,n-1\}$ ,

[TABLE]

Proof of Fact 1..

By definition $\tau_{\mathsf{C}}^{i-1}\geqslant i-1$ and $\tau_{\mathsf{C}}^{i-1}>i-1$ if and only if $\tau_{\mathsf{C}}^{i-1}=\tau_{\mathsf{C}}^{i}$ . Therefore,

[TABLE]

To prove that $(G_{i}-G_{i-1})\mathbbm{1}_{\{\tau_{\mathsf{C}}^{i-1}=\tau_{\mathsf{C}}^{i}\}}=0$ , we decompose according to the values of $\tau_{\mathsf{C}}^{i}$ :

[TABLE]

Now, remark that, for any $i\geqslant 0$ ,

[TABLE]

Then, for any $j\geqslant i$ ,

[TABLE]

This proves Fact 1. ∎

Fact 2. bounds the increments $G_{i}-G_{i-1}$ . The proof relies on the following lemma which is a consequence of the coupling result Lemma 2. Define $g_{n-1}=g_{n-1,\pi}=f$ and, for any $i\in[0,n-2]$ , let $g_{i}$ and $g_{i,\pi}$ denote the functions defined for any ${x}_{0}^{i}\in\mathsf{X}^{i+1}$ by

[TABLE]

Lemma \thelemma.

Assume H1, H2, H3. For any $i\in\{0,\ldots,n-1\}$ and $({x}_{0}^{i-1},x_{i})$ in $\mathsf{X}^{i}\times\mathsf{C}$ ,

[TABLE]

Proof.

Fix $i\in\{0,\ldots,n-1\}$ and ${x}_{0}^{i}\in\mathsf{X}^{i+1}$ . As $f\in\mathbb{BD}(\mathsf{X}^{n},c)$ , the function $\widetilde{f}_{i}:{y}_{1}^{n-1-i}\in\mathsf{X}^{n-1-i}\mapsto f({x}_{0}^{i},{y}_{1}^{n-1-i})\in\mathbb{R}$ satisfies

[TABLE]

Hence, $\widetilde{f}_{i}\in\mathbb{B}\mathbb{D}(\mathsf{X}^{n-1-i},c_{i+1:n-1})$ . Applying Lemma 2 to the function $h=\widetilde{f}_{i}$ yields

[TABLE]

Inequality (8) follows from H3. ∎

Fact 2. Let $\rho$ such that $r\leqslant\rho<1$ and $i\in\{1,\ldots,n-1\}$ . Then,

[TABLE]

where, $C_{1}=5L/(1-r)$ and $C_{2}=16L^{2}/(1-\rho)$ .

Proof of Fact 2..

For any integer $i\in\{1,\ldots,n\}$ , let

[TABLE]

From Fact 1., $G_{i}-G_{i-1}=G_{i,2}-G_{i,1}$ . By Markov’s property, for any $i\in\{0,\ldots,n-1\}$ and $x\in\mathsf{X}$ ,

[TABLE]

Now, let $R_{i,1}=g_{i-1}({X}_{0}^{i-1})\mathbbm{1}_{\{\tau_{\mathsf{C}}^{i-1}=i-1\}}-g_{i-1,\pi}({X}_{0}^{i-1})\mathbbm{1}_{\{\tau_{\mathsf{C}}^{i-1}=i-1\}}$ . We have

[TABLE]

Moreover, as $\tau_{\mathsf{C}}^{i}\geqslant i$ , by (6),

[TABLE]

Let $R_{i,2}=\sum_{j=i}^{n-2}(g_{j}({X}_{0}^{j})-g_{j,\pi}({X}_{0}^{j}))\mathbbm{1}_{\{\tau_{\mathsf{C}}^{i-1}=i-1,\tau_{\mathsf{C}}^{i}=j\}}$ . From (11) and (12),

[TABLE]

We bound separately all the terms in this decomposition. First, as $\pi$ is invariant and $f\in\mathbb{BD}(\mathsf{X}^{n},c)$ , for any $j\in\{i+1,\ldots,n-1\}$ and any ${x}_{0}^{j}\in\mathsf{X}^{j+1}$ ,

[TABLE]

Hence,

[TABLE]

To bound $|R_{i,1}|$ and $|R_{i,2}|$ in (13), we use Lemma 3. First, (8) directly yields

[TABLE]

Moreover, as $\{\tau_{\mathsf{C}}^{i}=j\}\subset\{X_{j}\in\mathsf{C}\}$ , (8) also yields

[TABLE]

Therefore,

[TABLE]

Plugging (14), (15) and (16) in (13) yields

[TABLE]

Both (9) and (10) follow from (17) by bounding separately the $3$ terms in the right-hand side of this inequality. Let us first establish (9). Since $r<1$ ,

[TABLE]

Moreover,

[TABLE]

As $r<1\leqslant\sigma_{\mathsf{C}}\circ\theta^{i-1}$ , plugging these upper bounds in (17) shows

[TABLE]

This proves (9). We use slightly different controls to prove (10) from (17). As $r\leqslant\rho<1$ , $\rho^{-\sigma_{\mathsf{C}}\circ\theta^{i-1}}\geqslant 1$ , and

[TABLE]

Moreover,

[TABLE]

As $\tau_{\mathsf{C}}^{i}\geqslant i$ and $i-\tau_{\mathsf{C}}^{i}=1-\sigma_{\mathsf{C}}\circ\theta^{i-1}$ ,

[TABLE]

In addition,

[TABLE]

Plugging (18), (19) and (20) in (17) and applying Cauchy-Schwarz inequality shows

[TABLE]

This proves (10) and thus Fact 2. ∎

Fact 3. * Assume H1, H2, H3. For any $x\in\mathsf{C}$ ,*

[TABLE]

where $C_{3}=4L\left(5/\log u+4ML\right)/(1-r\vee u^{-1/4})^{2}$ .

Proof of Fact 3..

For any $t\in\mathbb{R}$ , $\mathrm{e}^{t}\leqslant 1+t+t^{2}\mathrm{e}^{|t|}$ . Hence, as $\mathbb{E}_{x}[G_{i+1}-G_{i}|\mathscr{F}_{\tau_{\mathsf{C}}^{i}}]=0$ , for any $i\geqslant 0$ , we have

[TABLE]

By Fact 2.,

[TABLE]

Now by Markov’s property,

[TABLE]

Hence,

[TABLE]

Let $\rho=r\vee u^{-1/4}$ , $\varepsilon=\log u/(2C_{1})$ and assume first that $\|c\|_{\infty}\leqslant\varepsilon$ . By H $2$ ,

[TABLE]

Hence,

[TABLE]

By recurrence, it follows that

[TABLE]

Fix $\widetilde{x}$ in $\mathsf{X}$ and let $\widetilde{f}:\mathsf{X}^{n}\rightarrow\mathbb{R}$ be defined, for any $x_{0:n-1}$ in $\mathsf{X}^{n}$ , by

[TABLE]

As $f$ belongs to $\mathbb{BD}\left(\mathsf{X}^{n},c\right)$ , $\widetilde{f}$ belongs to $\mathbb{BD}\left(\mathsf{X}^{n},\widetilde{c}\right)$ , where

[TABLE]

Since $\|\widetilde{c}\|_{\infty}<\varepsilon$ and $\|\widetilde{c}\|\leqslant\|c\|$ , $\widetilde{f}$ satisfies

[TABLE]

Furthermore, by definition of $\widetilde{f}$ and since $f$ is in $\mathbb{BD}(\mathsf{X}^{n},c)$ , for any $x\in\mathsf{X}^{n}$ ,

[TABLE]

This implies

[TABLE]

This shows Fact 3 since

[TABLE]

∎

Fact 3 proves that there exists a constant $C=2C_{3}$ such that, for any $c\in\mathbb{R}^{n}$ , $f\in\mathbb{BD}(\mathsf{X}^{n},c)$ and $x\in\mathsf{C}$ ,

[TABLE]

Let $f\in\mathbb{BD}(\mathsf{X}^{n},c)$ and $x\in\mathsf{C}$ . For any $s>0$ , $sf\in\mathbb{BD}(\mathsf{X}^{n},c)$ . Hence, from (24), for any $s,t>0$ ,

[TABLE]

Choosing $s=t/(C\|c\|^{2})$ proves Theorem 1 with

[TABLE]

∎

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab. , 13:no. 34, 1000–1034, 2008.
2[2] J. Dedecker and S. Gouëzel. Subgaussian concentration inequalities for geometrically ergodic Markov chains. Electron. Commun. Probab. , 20:no. 64, 12, 2015.
3[3] R. Douc, E. Moulines, P. Priouret, and P. Soulier. Markov chains . Springer, 2018.
4[4] S. Ghosal and A. van der Vaart. Fundamentals of Nonparametric Bayesian Inference . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2017.
5[5] S. Le Corff, M. Lerasle, and E. Vernet. A Bayesian nonparametric approach for generalized Bradley-Terry models in random environment. ar Xiv:1808.08104, 2018.
6[6] C. Mc Diarmid. On the method of bounded differences. In Surveys in combinatorics, 1989 (Norwich, 1989) , volume 141 of London Math. Soc. Lecture Note Ser. , pages 148–188. Cambridge Univ. Press, Cambridge, 1989.
7[7] D. Paulin. Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electronic Journal of Probability , 20:1–32, 2015.
8[8] G. O. Roberts and J. S. Rosenthal. General state space Markov chains and MCMC algorithms. Probab. Surv. , 1:20–71, 2004.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A quantitative Mc Diarmid’s inequality for geometrically ergodic Markov chains

Abstract

1 Introduction

2 Notations and assumptions

Lemma \thelemma.

Remark \theremark.

Proof.

3 Main result

Theorem 1**.**

Proof of Theorem 1.

Proof of Fact 1..

Lemma \thelemma.

Proof.

Proof of Fact 2..

Proof of Fact 3..

Theorem 1.