A quantitative Mc Diarmid's inequality for geometrically ergodic Markov chains
Antoine Havet, Matthieu Lerasle, Eric Moulines, Elodie Vernet

TL;DR
This paper develops a quantitative version of Mc Diarmid's inequality tailored for geometrically ergodic Markov chains, enhancing the understanding of concentration inequalities in dependent stochastic processes.
Contribution
It introduces a modified coupling argument to extend the bounded difference inequality to all geometrically ergodic Markov chains, filling a gap in existing methods.
Findings
Provides a new quantitative bound for Markov chains
Extends Mc Diarmid's inequality to a broader class of chains
Improves the theoretical tools for analyzing dependent data
Abstract
We state and prove a quantitative version of the bounded difference inequality for geometrically ergodic Markov chains. Our proof uses the same martingale decomposition as \cite{MR3407208} but, compared to this paper, the exact coupling argument is modified to fill a gap between the strongly aperiodic case and the general aperiodic case.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Stochastic processes and statistical mechanics · Graph theory and applications
\newaliascnt
lemmatheorem \aliascntresetthelemma
\newaliascntcorollarytheorem \aliascntresetthecorollary
\newaliascntpropositiontheorem \aliascntresettheproposition
\newaliascntdefinitiontheorem \aliascntresetthedefinition
\newaliascntdefinitionPropositiontheorem \aliascntresetthedefinitionProposition
\newaliascntremarktheorem \aliascntresettheremark
A quantitative Mc Diarmid’s inequality for geometrically ergodic Markov chains
A. Havet, M. Lerasle, E. Moulines and E. Vernet
Abstract
We state and prove a quantitative version of the bounded difference inequality for geometrically ergodic Markov chains. Our proof uses the same martingale decomposition as [2] but, compared to this paper, the exact coupling argument is modified to fill a gap between the strongly aperiodic case and the general aperiodic case.
Keywords: Concentration inequalities ; Markov chains ; Geometric ergodicity ; Coupling.
AMS MSC 2010: 60J05; 60E15.
1 Introduction
The purpose of this note is to establish a quantitative version of Mc Diarmid’s inequality for geometrically ergodic Markov chains. Let denote independent random variables taking values in a measurable space and denote a vector of non-negative real numbers. A function satisfies the bounded difference inequality if for all and , we have
[TABLE]
The bounded difference inequality, first established in [6], shows that for all ,
[TABLE]
where . Several attempts have been made to extend this result to Markov chains. In [1], the concentration of particular functionals of the form , for centered functions in a class is established. The concentration of general functionals (satisfying (1)) of geometrically ergodic Markov chains was established in [2], where it is also proved that geometric ergodicity is a necessary assumption. However, the result in [2] is not quantitative. It states that for all geometrically recurrent set , there exists a constant , depending on such that for all and ,
[TABLE]
where for any , is the distribution of the Markov chain starting from (see the precise definition below). In many applications, it is necessary to get the explicit dependence of the constant as a function of the set . In particular, this problem arises when establishing posterior concentration rates of Bayesian non-parametric estimators; see for example [9, 4] for recent accounts on this theory. To extend these results to Markovian settings, the result of [2] cannot be applied directly and a quantitative version of (2) is required, where the dependence of on constants characterizing the mixing of the Markov chain is needed; see for example [10, 5].
A quantitative version of Mc Diarmid’s inequality for Markov chains was established in [7], where the constant depends here explicitly on the mixing time of the chain. The existence of finite mixing times requires uniform ergodicity of the chain, see for example [8, Section 3.3], an assumption that typically fails when the chain takes value in general state spaces. In this note, we prove an extension of Mc Diarmid’s inequality to geometrically ergodic Markov chains. Our proof is based on [2], but avoids the use of [2, Lemma 6] which requires the construction of an exact coupling. Exact coupling can actually be built in the strongly aperiodic case but there is a gap in the general aperiodic case.
The remaining of the paper is decomposed as follows, Section 2 introduces formally the notations and the assumptions of the main result, which is stated and proved in Section 3.
2 Notations and assumptions
Let be a measurable space. We denote by the total variation distance between probability measures. For any sequence and any non-negative integers and , with , let . For any and any vector , let denote the Euclidean norm of and denote its sup-norm.
We denote by the canonical filtered space, the canonical process and the shift operator on the canonical space defined, for any by , where, for any , . Set and for , define inductively, . We also need to define . To this aim, fix an arbitrary , we define such that for , is the constant sequence for all .
Let be a Markov kernel on . For any probability measure on , denote by the unique probability under which is a Markov chain with Markov kernel and initial distribution and let denote the expectation under the distribution . Recall that denotes the -algebra generated by . For any , let denote the Dirac mass at point . With some abuse of notation, we also denote (resp. ) instead of (resp. ).
For any and any integer , let
[TABLE]
For , we denote by the set of measurable functions such that for all and , The main result is established under the following conditions.
- H1
The Markov kernel is irreducible and aperiodic, with unique invariant probability .
- H2
There exist a non-empty set and two real numbers and such that
[TABLE]
- H3
There exist and such that, for any in the set of H2 and any ,
[TABLE]
where is the unique invariant measure granted in H1.
When the Markov kernel is uniformly ergodic, then H3 holds with . The following Lemma is a coupling result that replaces [2, Lemma 6]. It is instrumental in the sequel.
Lemma \thelemma.
For any probability measures and on , any , any and any ,
[TABLE]
Remark \theremark.
It is possible to avoid the factor in (\thelemma) under additional technical conditions, for example, when there exists a maximal coupling for , see [3, Lemma 23.2.1].
Proof.
Fix an arbitrary . For , we set . By convention, we set the constant function and . With these notations, we have the decomposition
[TABLE]
For all and all , let
[TABLE]
It is easily seen that , , which implies that
[TABLE]
Since , (3) shows that . Therefore,
[TABLE]
∎
3 Main result
The main result of this paper is the following quantitative version of Mac Diarmid’s inequality for geometrically ergodic Markov chains.
Theorem 1**.**
Assume H1, H2, H3. Let , and . Then, for all and ,
[TABLE]
where is given by
[TABLE]
Proof of Theorem 1.
Fix , and . Following [2], we decompose into martingale increments by conditioning to the stopping times , . For any integer , define
[TABLE]
As -a.s., it holds . Moreover, as , it also holds . Therefore, the difference is decomposed into a sum of the martingale increments as follows
[TABLE]
The proof is now decomposed into three facts that aim at bounding the Laplace transform of .
Fact 1. For any ,
[TABLE]
Proof of Fact 1..
By definition and if and only if . Therefore,
[TABLE]
To prove that , we decompose according to the values of :
[TABLE]
Now, remark that, for any ,
[TABLE]
Then, for any ,
[TABLE]
This proves Fact 1. ∎
Fact 2. bounds the increments . The proof relies on the following lemma which is a consequence of the coupling result Lemma 2. Define and, for any , let and denote the functions defined for any by
[TABLE]
Lemma \thelemma.
Assume H1, H2, H3. For any and in ,
[TABLE]
Proof.
Fix and . As , the function satisfies
[TABLE]
Hence, . Applying Lemma 2 to the function yields
[TABLE]
Inequality (8) follows from H3. ∎
Fact 2. Let such that and . Then,
[TABLE]
where, and .
Proof of Fact 2..
For any integer , let
[TABLE]
From Fact 1., . By Markov’s property, for any and ,
[TABLE]
Now, let . We have
[TABLE]
Moreover, as , by (6),
[TABLE]
[TABLE]
We bound separately all the terms in this decomposition. First, as is invariant and , for any and any ,
[TABLE]
Hence,
[TABLE]
To bound and in (13), we use Lemma 3. First, (8) directly yields
[TABLE]
Moreover, as , (8) also yields
[TABLE]
Therefore,
[TABLE]
Plugging (14), (15) and (16) in (13) yields
[TABLE]
Both (9) and (10) follow from (17) by bounding separately the terms in the right-hand side of this inequality. Let us first establish (9). Since ,
[TABLE]
Moreover,
[TABLE]
As , plugging these upper bounds in (17) shows
[TABLE]
This proves (9). We use slightly different controls to prove (10) from (17). As , , and
[TABLE]
Moreover,
[TABLE]
As and ,
[TABLE]
In addition,
[TABLE]
Plugging (18), (19) and (20) in (17) and applying Cauchy-Schwarz inequality shows
[TABLE]
This proves (10) and thus Fact 2. ∎
Fact 3. * Assume H1, H2, H3. For any ,*
[TABLE]
where .
Proof of Fact 3..
For any , . Hence, as , for any , we have
[TABLE]
By Fact 2.,
[TABLE]
Now by Markov’s property,
[TABLE]
Hence,
[TABLE]
Let , and assume first that . By H,
[TABLE]
Hence,
[TABLE]
By recurrence, it follows that
[TABLE]
Fix in and let be defined, for any in , by
[TABLE]
As belongs to , belongs to , where
[TABLE]
Since and , satisfies
[TABLE]
Furthermore, by definition of and since is in , for any ,
[TABLE]
This implies
[TABLE]
This shows Fact 3 since
[TABLE]
∎
Fact 3 proves that there exists a constant such that, for any , and ,
[TABLE]
Let and . For any , . Hence, from (24), for any ,
[TABLE]
Choosing proves Theorem 1 with
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab. , 13:no. 34, 1000–1034, 2008.
- 2[2] J. Dedecker and S. Gouëzel. Subgaussian concentration inequalities for geometrically ergodic Markov chains. Electron. Commun. Probab. , 20:no. 64, 12, 2015.
- 3[3] R. Douc, E. Moulines, P. Priouret, and P. Soulier. Markov chains . Springer, 2018.
- 4[4] S. Ghosal and A. van der Vaart. Fundamentals of Nonparametric Bayesian Inference . Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2017.
- 5[5] S. Le Corff, M. Lerasle, and E. Vernet. A Bayesian nonparametric approach for generalized Bradley-Terry models in random environment. ar Xiv:1808.08104, 2018.
- 6[6] C. Mc Diarmid. On the method of bounded differences. In Surveys in combinatorics, 1989 (Norwich, 1989) , volume 141 of London Math. Soc. Lecture Note Ser. , pages 148–188. Cambridge Univ. Press, Cambridge, 1989.
- 7[7] D. Paulin. Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electronic Journal of Probability , 20:1–32, 2015.
- 8[8] G. O. Roberts and J. S. Rosenthal. General state space Markov chains and MCMC algorithms. Probab. Surv. , 1:20–71, 2004.
