A Hoeffding's inequality for uniformly ergodic diffusion process
Michael C.H. Choi, Evelyn Li

TL;DR
This paper extends Hoeffding's inequality to continuous-time uniformly ergodic diffusion processes, providing a new probabilistic bound useful in stochastic process analysis and applications.
Contribution
It introduces a Hoeffding's inequality for diffusion processes, bridging a gap between discrete-time Markov chains and continuous-time diffusions.
Findings
Derived a Hoeffding's inequality for diffusion processes.
Illustrated the results with examples involving Jacobi diffusion and Ornstein-Uhlenbeck process.
Provided bounds for large deviation probabilities in continuous-time settings.
Abstract
In this note, we present a version of Hoeffding's inequality in a continuous-time setting, where the data stream comes from a uniformly ergodic diffusion process. Similar to the well-studied case of Hoeffding's inequality for discrete-time uniformly ergodic Markov chain, the proof relies on techniques ranging from martingale theory to classical Hoeffding's lemma as well as the notion of deviation kernel of diffusion process. We present two examples to illustrate our results. In the first example we consider large deviation probability on the occupation time of the Jacobi diffusion, a popular process used in modelling of exchange rates in mathematical finance, while in the second example we look at the exponential functional of a finite interval analogue of the Ornstein-Uhlenbeck process introduced by Kessler and S{\o}rensen (1999).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Hoeffding’s inequality for uniformly ergodic diffusion process
Michael C.H. Choi, Evelyn Li
Institute for Data and Decision Analytics, The Chinese University of Hong Kong, Shenzhen, Guangdong, 518172, P.R. China
School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, 510275, P.R. China
Abstract.
In this note, we present a version of Hoeffding’s inequality in a continuous-time setting, where the data stream comes from a uniformly ergodic diffusion process. Similar to the well-studied case of Hoeffding’s inequality for discrete-time uniformly ergodic Markov chain, the proof relies on techniques ranging from martingale theory to classical Hoeffding’s lemma as well as the notion of deviation kernel of diffusion process. We present two examples to illustrate our results. In the first example we consider large deviation probability on the occupation time of the Jacobi diffusion, a popular process used in modelling of exchange rates in mathematical finance, while in the second example we look at the exponential functional of a finite interval analogue of the Ornstein-Uhlenbeck process introduced by Kessler and Sørensen (1999).
AMS 2010 subject classifications: 60F10, 60J10, 60G44
Keywords: Diffusion process; Hoeffding’s inequality; large deviations
1. Introduction and main results
The seminal work of Hoeffding (1963), which gives bound on large deviation probability of sum of bounded random variables, has now became one among many classical tools in probability theory. In particular, it has far reaching applications in statistics and machine learning, see for instance Devroye et al. (1996) and the references therein. Hoeffding’s inequality has then been refined and extended to various settings. For example, motivated by applications in Markov decision processes and reinforcement learning Glynn and Ormoneit (2002) derives a Hoeffding’s inequality for uniformly ergodic Markov chain, while Boucher (2009) presents another method to prove Hoeffding’s inequality in terms of the Drazin inverse of Markov chain.
Inspired by the work cited above, we aim at extending Hoeffding’s inequality to the setting of diffusion process. Contrary to the classical setting, we assume that the data stream arrives continuously from a uniformly ergodic diffusion process. The major difficulty of the analysis is then twofold. First, as we have a continuous data stream instead of discrete data points, previous analysis does not carry over to this setting easily. In addition, the dependency within the data stream complicates the situation. To overcome these difficulties, we employ classical martingale techniques for diffusion process as well as the notion of deviation kernel to aid our analysis. Comparing our result with the existing literature on concentration inequalities for diffusion processes Galtchouk and Pergamenshchikov (2007), we argue that our proof is conceptually simpler since it utilizes similar techniques as in the discrete-time Markov chain case Glynn and Ormoneit (2002); Boucher (2009). In addition, as we shall see in Corollary 1.1 below, it is readily applicable as long as we have the relevant eigenvalue information of the generator of the diffusion.
To this end, we fix our notation and introduce the tools we need for our main result Theorem 1.1 below. Let be a filtered probability space satisfying the usual conditions. Suppose that we have an ergodic diffusion process on state space with transition kernel , transition density and stationary distribution . We write and to be the conditional probability and expectation when the process is initialized at . is characterized by the infinitesimal generator , which acts on the space of twice differentiable functions and is defined to be
[TABLE]
where and are respectively known as the drift and diffusion coefficient of . A tool that we will use in the main result below is the deviation kernel of , which is defined as
[TABLE]
where is the projection kernel with density for all . It is well-known that the function solves the Poisson equation , see e.g. Glynn and Meyn (1996). For bounded function , we define the supremum norm to be . We also write to be the induced operator norm of on the space of bounded functions. For further references on , we refer readers to the work of Cheng and Mao (2015); Whitt (1992); Mao (2002). On one-dimensional state space , we now recall two fundamental notions associated with the diffusion , namely the scale function and the speed function . For , these functions are defined by
[TABLE]
where is a fixed and arbitrary reference point. Their respective densities are given by
[TABLE]
In this note, we are primarily interested in uniformly ergodic diffusions. That is, it is the class of ergodic diffusions such that the convergence to equilibrium in total variation distance is uniformly bounded by, for and some constants , ,
[TABLE]
where is the total variation distance between and . We write to be the first hitting time of and
[TABLE]
to be the average hitting time of . While verifying uniform ergodicity can be quite difficult, it turns out that, according to (Cheng and Mao, 2015, Theorem 2.2), uniform ergodicity for diffusion on with reflecting boundary at [math] is equivalent to a few readily checkable conditions on , and :
Proposition 1.1** (Necessary and sufficient conditions for uniform ergodicity Cheng and Mao (2015)).**
Given a ergodic diffusion on with reflecting boundary at [math], the following statements are equivalent:
- (1)
* is uniformly ergodic;* 2. (2)
; 3. (3)
; 4. (4)
* and*
[TABLE]
where is the essential spectrum of and are the non-zero eigenvalues of .
At times it maybe easier to check item (2) as it depends on and through and , while at other times when eigenvalues information are available perhaps it is more convenient to check item (4). As a simple illustration of item (2), we consider the class of diffusions with and , where is a parameter. This class is first studied in Mao (2002). It is easy to see that and . As a result, item (2) now reads
[TABLE]
and so this class of diffusions with are uniformly ergodic. For illustration of item (4), we defer the readers to Corollary 1.1 when we discuss the Jacobi process. In view of Proposition 1.1, for uniformly ergodic we have
[TABLE]
where the first inequality follows from (Choi, 2018, Theorem ). In other words, for uniformly ergodic diffusion the induced operator norm is finite. Such a term will appear in our version of Hoeffding’s inequality Theorem 1.1 below.
With the above notation, we are now ready to state our main result. It follows from the classical ergodic theorem that for bounded function , the time average converges almost surely to the space average as , see e.g. (Bhattacharya and Waymire, 2009, Theorem ). In our main result below, we present non-asymptotic probabilistic error bound of such convergence:
Theorem 1.1**.**
Suppose that is a uniformly ergodic diffusion and is a bounded function. For any , , ,
[TABLE]
where is the deviation kernel of the process .
Remark 1.1* (On the assumption of bounded ).*
As usual in the Hoeffding’s inequality literature, our main result Theorem 1.1 requires the function to be bounded. This assumption is crucial when we apply the classical Hoeffding’s lemma (Devroye et al., 1996, Lemma ) to certain martingale difference sequence in (2.6) and (2.7) below, which only holds when the random variable of interest is bounded. Although there is extension of the Hoeffding’s lemma to non-negative random variable with finite mean Bentkus (2008), this result is however difficult to apply in our setting as one need to find random variables that stochastically dominate the martingale difference sequence. We leave this question of extending the main result to unbounded as future work.
As our first example to illustrate our main result Theorem 1.1, we investigate the Jacobi process on the state space . The generator of this process is given by
[TABLE]
where are parameters of and are assumed to take on values such that and , i.e. and . With these choices of parameters, the stationary distribution of Jacobi process is the Beta distribution with parameters and , where its density is governed by , and denotes the Beta function. According to (Albanese and Kuznetsov, 2009, Appendix ) and (Forman and Sørensen, 2008, Section ) is ergodic with and
[TABLE]
In view of Proposition 1.1 item (4), is thus uniformly ergodic. One major motivation for us to study such a process stems from its usage in financial modelling, where a more general form of Jacobi process has been employed to model exchange rates in a target zone, see Larsen and Sørensen (2007) and the references therein. In these models, one is often interested in the long-run average of the occupation time of the process in certain region , say the occupation time of the exchange rate above or below a threshold. Unfortunately, distributional information on the functional is often inaccessible, where is the indicator function of the set . In practice, one may resort to the space average as a natural approximation of the quantity of interest , where the former is often easier to access than the latter. Our main result in Theorem 1.1 thus provides an invaluable tool and can be used to give non-asymptotic probabilistic error bounds on such approximation. Another situation where Theorem 1.1 is needed is about constructing confidence interval of the functional . One can easily construct confidence band based on these large deviation probability. With these motivations in mind, we now apply Theorem 1.1 to the Jacobi process with that gives:
Corollary 1.1**.**
Suppose that is the Jacobi process which is uniformly ergodic with generator given by (1.1) and parameters . For any , and measurable subset , we have
[TABLE]
where
[TABLE]
is the average hitting time of the Jacobi process.
As our first remark, we note that the upper bound in (1.2) can be quite loose since it does not depend on the size of . Such a bound indeed holds as long as we have in Theorem 1.1. In addition, we see that this upper bound depends only on the parameters and through but not on .
In our second example, we introduce the finite interval analogue of the Ornstein-Uhlenbeck process first studied by Kessler and Sørensen (1999) on the state space , where we take the drift to be , the diffusion coefficient to be and to be a parameter. That is, the generator is written as
[TABLE]
According to (Forman and Sørensen, 2008, Section ) is ergodic with and
[TABLE]
In view of Proposition 1.1 item (4), is thus uniformly ergodic for any . Specializing into the case , we see that the stationary distribution has density given by
[TABLE]
For , if we take with in Theorem 1.1, the time integral becomes
[TABLE]
the exponential functional associated with . Often distributional information of exponential functionals are difficult to obtain, see for instance the book Yor (2001). One may approximate such functional by means of their space average , and our results come in handy since they give probabilistic error bound on such approximation. Theorem 1.1 now reads
Corollary 1.2**.**
Suppose that is the finite interval analogue of the Ornstein-Uhlenbeck process which is uniformly ergodic with generator given by (1.3) and parameter . For any , and , we have
[TABLE]
where
[TABLE]
is the average hitting time of .
For further concrete examples of uniformly ergodic diffusions with explicit eigenvalues information, we refer interested readers to the work of Kessler and Sørensen (1999); Forman and Sørensen (2008).
The rest of the paper is organized as follows. In Section 2, we first present the proof of the main result Theorem 1.1, followed by detailing the proof of Corollary 1.1 and the proof of Corollary 1.2.
2. Proof of the main results
2.1. Proof of Theorem 1.1
Suppose without loss of generality that the mean of with respect to is zero, that is, . To begin with, it follows readily from the induced operator norm of that we have
[TABLE]
where we recall is the solution to the Poisson equation. Now, for the large deviation probability, we see that
[TABLE]
In the above equation, (2.2) comes from Markov inequality, which holds for any , while (2.3) follows from the Poisson equation . Now, we explicitly construct a martingale that is useful in our analysis, namely
[TABLE]
Then by a classical result in (Bhattacharya and Waymire, 2009, Chapter Theorem ), we see that is a mean zero -martingale, where we again recall is the filtration of . Using (2.1) and (2.4), the tail bound in (2.3) is further upper bounded by
[TABLE]
Now, we proceed to examine the bound for . In order to use the classical Hoeffding’s lemma for bounded random variables (Devroye et al., 1996, Lemma ), we write as
[TABLE]
As a result, to bound the martingale it suffices to bound the martingale differences . Using the definition of in (2.4), these bounds are given by, for ,
[TABLE]
where we use the Poisson equation in the first inequality and (2.1) in the second inequality. Similarly,
[TABLE]
It follows from double expectation, (2.5), (2.6) and (2.7) that the upper bound in (2.3) becomes
[TABLE]
where the first and second inequality follows from repeated applications of the Hoeffding’s lemma (Devroye et al., 1996, Lemma ). Finally, collecting the above results the tail bound is given by
[TABLE]
which is minimized at where
[TABLE]
Desired result follows by substituting into (2.8).
2.2. Proof of Corollary 1.1
Desired result follows from taking in Theorem 1.1 and utilizing the follow bound on the induced operator norm of the deviation kernel :
[TABLE]
see e.g. (Choi, 2018, Theorem ). As for the expression of the average hitting time , the eigentime identity Cheng and Mao (2015) gives
[TABLE]
where are the eigenvalues of which are given by, for ,
[TABLE]
see e.g. (Albanese and Kuznetsov, 2009, Appendix ).
2.3. Proof of Corollary 1.2
Desired result follows from taking in Theorem 1.1, and using
[TABLE]
as well as the following bound on the induced operator norm of the deviation kernel :
[TABLE]
where again the first equality follows from Cheng and Mao (2015) with being given in (1.4) with parameter .
Acknowledgement
Acknowledgement. We thank the anonymous referee for constructive comments that improve the presentation of the manuscript. This work is partially supported by the Chinese University of Hong Kong, Shenzhen grant PF01001143.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Albanese and Kuznetsov (2009) C. Albanese and A. Kuznetsov. Transformations of Markov processes and classification scheme for solvable driftless diffusions. Markov Process. Related Fields , 15(4):563–574, 2009.
- 2Bentkus (2008) V. Bentkus. An extension of the Hoeffding inequality to unbounded random variables. Lith. Math. J. , 48(2):137–157, 2008.
- 3Bhattacharya and Waymire (2009) R. N. Bhattacharya and E. C. Waymire. Stochastic processes with applications , volume 61. Siam, 2009.
- 4Boucher (2009) T. R. Boucher. A Hoeffding inequality for Markov chains using a generalized inverse. Statist. Probab. Lett. , 79(8):1105–1107, 2009.
- 5Cheng and Mao (2015) L.-J. Cheng and Y.-H. Mao. Eigentime identity for one-dimensional diffusion processes. J. Appl. Probab. , 52(1):224–237, 2015.
- 6Choi (2018) M. C. Choi. A scale function approach for Stein’s method of one-dimensional diffusion. Submitted , 2018.
- 7Devroye et al. (1996) L. Devroye, L. Györfi, and G. Lugosi. A probabilistic theory of pattern recognition , volume 31 of Applications of Mathematics (New York) . Springer-Verlag, New York, 1996.
- 8Forman and Sørensen (2008) J. L. Forman and M. Sørensen. The Pearson diffusions: a class of statistically tractable diffusion processes. Scand. J. Statist. , 35(3):438–465, 2008.
