Asymptotic Exponentiality of the First Exit Time of the Shiryaev-Roberts Diffusion with Constant Positive Drift
Aleksey S. Polunchenko

TL;DR
This paper proves that the first exit time of a Shiryaev-Roberts diffusion with positive drift becomes exponentially distributed as the boundary goes to infinity, with an explicit analytical approach.
Contribution
It provides an explicit analytical proof that the standardized first exit time converges to an exponential distribution as the boundary size increases.
Findings
The moment generating function converges to that of an exponential distribution.
Explicit analytical expression for the first exit time's MGF is derived.
The result extends understanding of quickest change-point detection methods.
Abstract
We consider the first exit time of a Shiryaev-Roberts diffusion with constant positive drift from the interval where . We show that the moment generating function (Laplace transform) of a suitably standardized version of the first exit time converges to that of the unit-mean exponential distribution as . The proof is explicit in that the moment generating function of the first exit time is first expressed analytically and in a closed form, and then the desired limit as is evaluated directly. The result is of importance in the area of quickest change-point detection, and its discrete-time counterpart has been previously established - although in a different manner - by Pollak and Tartakovsky (2009).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Process Monitoring · Bayesian Methods and Mixture Models · Statistical Distribution Estimation and Applications
\WithSuffix
[7]G^ #1,#2_#3,#4(#5 #6| #7)
\WarningFilterrefcheckUnused label ‘sec:intro’
Asymptotic Exponentiality of the First Exit Time of the Shiryaev–Roberts Diffusion with Constant Positive Drift
**Aleksey S. Polunchenko
**Department of Mathematical Sciences, State University of New York at Binghamton,
Binghamton, New York, USA
000Address correspondence to A. S. Polunchenko, Department of Mathematical Sciences, State University of New York (SUNY) at Binghamton, 4400 Vestal Parkway East, Binghamton, NY 13902–6000, USA; Tel: +1 (607) 777–6906; Fax: +1 (607) 777–2450; E-mail: [email protected].
Abstract: We consider the first exit time of a Shiryaev–Roberts diffusion with constant positive drift from the interval where . We show that the moment generating function (Laplace transform) of a suitably standardized version of the first exit time converges to that of the unit-mean exponential distribution as . The proof is explicit in that the moment generating function of the first exit time is first expressed analytically and in a closed form, and then the desired limit as is evaluated directly. The result is of importance in the area of quickest change-point detection, and its discrete-time counterpart has been previously established—although in a different manner—by Pollak & Tartakovsky (2009).
Keywords: First exit times; Generalized Shiryaev–Roberts procedure; Laplace transform; Markov diffusion; Moment generating function; Quickest change-point detection; Whittaker functions.
Subject Classifications: 62L10; 60G40; 60J60.
. ntroduction
This work centers around the so-called Generalized Shiryaev–Roberts (GSR) stochastic process, a time-homogeneous Markov diffusion well-known in the area of quickest change-point detection. See, e.g., Shiryaev, (1961, 1963, 1978, 2002, 2011), Pollak and Siegmund, (1985), Feinberg and Shiryaev, (2006), Burnaev et al., (2009), Polunchenko and Sokolov, (2016), and Polunchenko, (2016, 2017). More specifically, for a reason to be made clear shortly, the case of interest is that of the GSR process with a constant positive drift. Formally, we shall deal with the solution of the stochastic differential equation (SDE)
[TABLE]
where is a given coefficient (whose meaning is explained below), and is standard Brownian motion (i.e., , , and ); the initial value is often referred to as the headstart. The process governed by (1.1) is a GSR process with unit drift and headstart ; the unit drift can be trivially adjusted to any other constant positive level. The main contribution of this work concerns the distribution of the first exit time of from the interval with given, i.e., the stopping time:
[TABLE]
where is a preset level. Correspondingly, the process and its characteristics pose interest only up to the point of “extinction” at time instance , i.e., conditional on for a given .
Just as does the GSR diffusion —whether with constant positive drift or with a more general affine drift—the stopping time , too, plays a major role in the theory of quickest change-point detection: it is the Run Length of the so-called Generalized Shiryaev–Roberts (GSR) change-point detection procedure, set up to react to a possible shift in the drift of standard Brownian motion monitored “live”. Parameter present in the right-hand side of SDE (1.1) is the anticipated magnitude of the possible change in the drift. More concretely, equation (1.1) describes the dynamics of the GSR statistic in the pre-change regime, i.e., under the assumption that the drift has not yet “kicked in”, so that the observed Brownian motion is still “driftless”. Hence the stopping time given by (1.2) is the GSR procedure’s Run Length to false alarm: at time instance the GSR procedure sounds a false alarm, i.e., falsely declares the Brownian motion under surveillance as having gained a drift of size . The first moment of , i.e., , is known in the change-point detection literature as the Average Run Length (ARL) to false alarm, and it is a popular metric of the “cost” of triggering a false alarm. Obviously increases with , and, in particular, letting explode is the same as letting explode, and vice versa.
The GSR procedure was proposed by Moustakides et al., (2011) as a headstarted (hence, more general) version of the classical quasi-Bayesian Shiryaev–Roberts (SR) procedure that emerged from the independent work of Shiryaev (1961; 1963) and that of Roberts (1966). The interest in the GSR procedure (and its variations) is due to its recently discovered strong optimality properties. See, e.g., Burnaev, (2009), Feinberg and Shiryaev, (2006), Burnaev et al., (2009), Pollak and Tartakovsky, (2009), Polunchenko and Tartakovsky, (2010), Tartakovsky and Polunchenko, (2010), Vexler and Gurevich, (2011), and Tartakovsky et al., (2012).
We are now in a position to describe the specific contribution of this work. It is shown in the sequel that a suitably standardized version of the stopping time given by (1.2) is asymptotically, as , exponentially distributed with unit mean, for any headstart . Put another way, the GSR procedure’s Run Length to false alarm, properly scaled, is asymptotically, as the ARL to false alarm level explodes (i.e., as ), unit-mean exponential. More specifically, it is shown in the sequel that, as GSR procedure’s ARL to false alarm level gets large, the moment generating function (mgf) or the Laplace transform of a properly scaled version of the GSR stopping time converges to that of the unit-mean exponential distribution. This implies convergence in distribution. The proof is explicit in that the mgf is first found analytically and in a closed form, and then the desired limit is shown directly to evaluate to the mgf of the unit-mean exponential distribution. It is also of note that the unit-drift assumption imposed on is essential, for it makes a (positive) recurrent process with all the ensuing consequences which ultimately “add up” to the desired asymptotic exponentiality of .
The discrete-time analogue of our result has been previously established—in an entirely different fashion—by Pollak and Tartakovsky (2009); see also Tartakovsky et al., (2008) and Yakir, (1998, 1995). As a matter of fact, Pollak and Tartakovsky (2009) proved the result not only for the GSR procedure, but for an entire class of Markov stopping times, which includes the GSR procedure as well as Page’s (1954) celebrated Cumulative Sum (CUSUM) “inspection” scheme. More importantly, Pollak and Tartakovsky (2009) also illustrated the importance of the result in the context of sequential change-point detection. Specifically, they argued that if the stopping time of a change-point detection procedure is asymptotically exponential under the no-change hypothesis, it is reasonable to expect it to be approximately exponentially distributed (under the no-change hypothesis) whenever the ARL to false alarm is large. Consequently, since the exponential distribution is fully characterized but its mean alone, the ARL to false alarm can be seen as indeed being an exhaustive metric of the false alarm risk. See, e.g., Tartakovsky, (2008) for a more detailed discussion of this issue. Moreover, Pollak and Tartakovsky (2009) also argued that the asymptotic exponentiality (in the pre-change regime) can be used for the evaluation of the change-point detection procedure’s local false alarm probabilities. As pointed out by Tartakovsky, (2005) these probabilities are of importance in a variety of applications. All these considerations obviously apply to the continuous-time setting considered in this work as well.
The rest of the paper is three sections. The first one, Section 2, is the paper’s main section, for this is where we formally state and then prove our main result. The second one, Section 3, is where we offer a short numerical study to complement and confirm our theoretical contribution experimentally. The third one, Section 4, is where we make a few concluding remarks and draw a line under the entire paper.
. he Main Result
We first formally introduce the main object of study of this work. Let
[TABLE]
denote the mgf (Laplace transform) of the stopping time given by (1.2). We are interested in the asymptotic behavior of as . To that end, an important fact about is that, for any , and , it can actually be expressed analytically and in closed form through the spectral characteristics of the second-order differential operator
[TABLE]
i.e., the infinitesimal generator of the GSR diffusion governed by the SDE (1.1). More concretely, the operator is restricted to the state space of , i.e., the interval , , and the relevant spectral characteristics of are the solutions and of the Sturm–Liouville problem \big{[}\mathscr{D}\circ u\big{]}(x,\lambda)=\lambda\,u(x,\lambda) or explicitly
[TABLE]
subject to the boundary conditions
[TABLE]
which, translated into classical Feller’s (1952) boundary classification, cast as an entrance boundary for , and as an absorbing boundary for , i.e., the process is “killed” once it hits the right end of the interval ; in “differential equations speak”, the former condition is a Neumann-type boundary condition, while the latter condition is a Dirichlet-type boundary condition. It is apparent that the spectrum of the operator is dependent on , and from now on, wherever necessary, we shall emphasize this dependence via the notation . Equation (2.3) subject to the boundary conditions (2.4) is a Sturm–Liouville problem, and it has recently received a renewed burst of attention in the literature on mathematical finance and quickest change-point detection. See, e.g., Linetsky, (2004, 2007), Collet et al., (2013), and notably Polunchenko, (2016, 2017). The work of Polunchenko (2017) will be referenced repeatedly throughout the sequel, following, for convenience, Polunchenko’s (2017) original notation.
We now turn to the work of Linetsky, (2007) and recall a general result from the interface between stochastic processes and Sturm–Liouville operator theory (theory of second-order self-adjoint differential operators); see also (Itô and McKean, , 1974, Chapter 4, Section 4.6) and (Borodin and Salminen, , 2002, Chapter II, Section 1.10). Let be a one-dimensional, time-homogeneous, regular Markov diffusion whose state space is some interval , where , and such that is fixed. If is generated by the SDE where the diffusion coefficient is continuous and strictly positive inside and the drift coefficient is continuous on , then the Laplace transform of the nonnegative random variable with is given by
[TABLE]
where , and and are two fundamental solutions of the equation
[TABLE]
subject to appropriate boundary conditions. Specifically, these fundamental solutions can be made unique (up to a multiplicative constant factor dependent on but independent of ) by requiring that be an increasing function of subject to a boundary condition at , while be a decreasing function of , subject a boundary condition at .
To translate the above to our specific problem (2.3)–(2.4) it suffices to note that equation (2.3) can be easily converted to an equation of the form (2.6) by means of the integrating factor method. As a matter of fact, for our operator given by (2.2), it has already been established, e.g., by Polunchenko and Sokolov, (2016) and by Polunchenko, (2016), that
[TABLE]
where
[TABLE]
and where and denote the so-called Whittaker and functions, respectively. The Whittaker functions are defined as the two fundamental solutions of the classical Whittaker (1904) equation
[TABLE]
where is the unknown function of , and are specified parameters; see, e.g., (Buchholz, , 1969, Chapter I). The Whittaker functions are typically considered in the cut plane to ensure they are not multi-valued. For an extensive study of these functions and various properties thereof, see, e.g., Slater, (1960) and Buchholz, (1969).
At this point, in view of (2.5) and (2.7), we can conclude at once that
[TABLE]
where is as in (2.8). Parenthetically, we remark that, apparently, this result, though relatively simple to obtain, was not previously known to the change-point detection community. It is also of note that the Laplace transform (2.9) can be inverted to yield the (pre-change) distribution of the GSR stopping time , and the inversion has already been performed by Polunchenko, (2017).
To proceed, observe that , by definition (1.2), almost surely explodes as . Hence, it shouldn’t come as a surprise that, for any fixed and , the limit of as is zero. Heuristically, this can be seen directly from the definition (2.1). More formally, one can appeal to the small-argument asymptotic behavior of the Whittaker function
[TABLE]
where here and onward denotes the Gamma function (see, e.g., Abramowitz and Stegun, 1964, Chapter 6), to first get
[TABLE]
so that
[TABLE]
because for , and then conclude from the formula (2.9) for the mgf that the latter does, in fact, go to zero as .
However, as previously noted by Pollak and Tartakovsky, (2009), there is a way to rescale so as to get it to converge to a meaningful random variable as ; see also Tartakovsky et al., (2008). We now explain the idea.
Let
[TABLE]
denote the GSR statistic’s so-called quasi-stationary cumulative distribution function (cdf) and density, respectively. This time-invariant probability measure is independent of the GSR statistic’s headstart , and its existence can be inferred, e.g., from the work of Cattiaux et al., (2009); see also (Collet et al., , 2013, Section 7.8.2). Exact closed-form formulae for both and have been recently obtained by Polunchenko, (2017). If the GSR statistic is started off a random point sampled from its quasi-stationary distribution, i.e., if , then the statistical characteristics of the GSR statistic will be time-invariant, until the statistic hits the threshold . Since the probability of hitting will be time-invariant as well, the distribution of the GSR stopping time will be exponential. More formally, define as the solution of the SDE with , and let with and . The stopping time is known in the quickest change-point detection literature as the randomized Shiryaev–Roberts–Pollak detection procedure, and it was originally proposed (for the discrete-time version of the problem) and first investigated by Pollak, (1985); it was also recently studied by Burnaev et al., (2009). Specifically, since is level for all , one can conclude that is exponentially distributed with parameter , so that the product is unit-mean exponential. As noted by Pollak and Tartakovsky, (2009), intuitively, the large- behavior of for each fixed headstart is similar to that of . Hence, it stands to reason that is approximately unit-mean exponential, whenever is large. Put another way, the right scaling factor for is .
The constant is the largest (nonpositive) eigenvalue of the operator given by (2.2). For this kind of an operator it is known from the general Sturm–Liouville theory (see, e.g., Fulton et al., 1999) that its spectrum is purely discrete, simple, located to the left of the origin (i.e., nonpositive), and is determined entirely by the Dirichlet condition (2.4), i.e., from the equation with fixed. More concretely, from (2.4) and (2.7) it can be readily seen that is the largest (nonpositive) solution of the equation
[TABLE]
where is fixed and is as in (2.8). This equation was previously analyzed by Polunchenko, (2016, 2017), who, in particular, obtained order-one, order-two, and order-three asymptotic “large-” approximations to . As an aside, we note that due to the discrete and simple nature of the spectrum of the operator the range of values of in the above formula (2.9) for the mgf can be extended from to where is largest (nonpositive) eigenvalue of .
The main contribution of this work can now be succinctly put as follows.
Theorem 2.1**.**
* for any fixed and ; recall that here is the largest (nonpositive) solution of equation (2.11).*
The plan for the remainder of this section is to prove this theorem. To that end, in view of (2.3), the problem essentially is to show that
[TABLE]
where is as in (2.8) and is the largest (nonpositive) solution of equation (2.11).
The above limit can be evaluated by treating the numerator and the denominator separately. The key observation for either part is that as , i.e., is a monotonically increasing function of , converging to 0 from below as ; see Polunchenko, (2017) for a proof. An immediate implication of this circumstance is that since , then from (2.8) we also have , and therefore
[TABLE]
because which is a special case of (Buchholz, , 1969, Identity (28a), p. 23) asserting that . Hence, the numerator of the fraction under the limit (2.12) goes to unity as .
It remains to take care of the denominator of the fraction under the limit (2.12), i.e., to show that
[TABLE]
which is a more delicate problem. Specifically, the problem is that not only the argument of the Whittaker function is dependent on , but also its second index which goes to as because . As a result, the above small-argument asymptotics (2.10) of the Whittaker function is not “fine” enough and needs to be improved.
To that end, let us again turn to the work of Polunchenko, (2017) where the function was expanded into a Taylor series with respect to around zero up to the third order for any fixed ; it is noteworthy that is an entire function of for any fixed and . The expansion involves the following two special functions:
- •
The exponential integral
[TABLE]
see, e.g., (Abramowitz and Stegun, , 1964, Chapter 5); and
- •
Meijer’s (1936) celebrated -function defined as the Mellin-Barnes integral
[TABLE]
where denotes the imaginary unit, i.e., , the integers , , , and are such that and , and the contour of integration is closed in an appropriate way to ensure the convergence of the integral. It is also required that no difference be an integer. The -function is a very general function, and includes, as special cases, not only all elementary functions, but a number of special functions as well. An extensive list of special cases of the Meijer -function can be found, e.g., in the classical special functions handbook of Prudnikov et al., (1990), which also includes a summary of the function’s basic properties. We will need the following particular case of the Meijer -function:
[TABLE]
where is the exponential integral defined in (2.15); see (Polunchenko, , 2017, Appendix A) for a proof.
We are now in a position present the third-order Taylor expansion obtained by Polunchenko, (2017) for the function with fixed.
Theorem 2.2** (Polunchenko, 2017).**
For any it holds true that
[TABLE]
where is as in (2.8), and
[TABLE]
with and given by (2.15) and by (2.16), respectively.
This theorem readily gives the expansion
[TABLE]
which, as we shall see shortly, is more “fine” than necessary to pass and prove (2.14). To do so, we first recall yet another result of Polunchenko, (2017), viz. the double inequality
[TABLE]
where is the parameter of the SDE (1.1). Hence so that
[TABLE]
The only issue is that the functions and given, respectively, by (2.17) and (2.16), both go to infinity as goes to zero. However, in view of (Abramowitz and Stegun, , 1964, Inequality 5.1.20, p. 229) which states that
[TABLE]
from (2.15), (2.17) and (2.16) it can be seen that
[TABLE]
i.e., the functions and both go to infinity as slower than goes to zero as .
At this point Theorem 2.1, which is our main result, is straightforward to prove: it is merely a matter of using (2.19) and (2.20) in (2.18) to obtain (2.14), and then combining it with (2.13) to get (2.12), which in view of (2.9) is precisely the desired result.
To draw a line under this section, we remark that the formula (2.9) for the mgf of the GSR stopping time can also be put to a more classical use, viz. to compute for , i.e., to determine the actual moments of the GSR stopping time (under the no-change hypothesis). Specifically, since from definition (2.1) it is evident that
[TABLE]
and because formula (2.9) expresses explicitly as a quotient of two Whittaker functions, getting the -th moment of the GSR stopping time essentially comes down to finding the derivatives, up through the -th order inclusive, of the Whittaker function with respect to the second index . More concretely, from (2.9) it is direct to see that the required derivatives are of the following form:
[TABLE]
where , , and is as in (2.8). Since the first three (for , and ) of these derivatives are essentially given by Theorem 2.2 due to Polunchenko, (2017), computing , , and , i.e., the first three moments of the GSR stopping time, is a matter of elementary algebra. The answer is:
[TABLE]
and
[TABLE]
where , , and the functions and are given, respectively, by (2.17) and (2.16). The first moment formula is well-known in quickest change-point detection, and was obtained—in an entirely different fashion—by Shiryaev, (1961, 1963) and many others. However, the second and third moment formulae appear to be new results. The fourth and higher moments can be found in a similar fashion, but since the formulae are far more cumbersome, they will be presented elsewhere.
. umerical Results
To get a better sense as to how fast, as , the random variable becomes unit-mean exponentially-distributed, we now offer a short numerical study where we assess the proximity of to for various values of , , , and . Specifically, since from Theorem 2.1 we can deduce that for any , or equivalently that for any , it is reasonable to expect the function to be close to for any , provided, however, that is sufficiently large. It is the proximity of to the line across a range of values of that we shall use to judge how close the distribution of is to unit-mean exponential. The evaluation of as a function of for any , , and is not a problem at all, because the survival function of the GSR stopping time given by (1.2) was recently found analytically and in a closed-form by Polunchenko, (2016) who also developed a Mathematica script to evaluate and each to within hundreds of decimal places of accuracy and for any , , and .
To get started, let us first set , which, in practice, would be considered low, so that the asymptotic exponentiality might not be quite in effect yet. Figures 1 show the obtained results for , , , and . Specifically, Figure 1(a) shows as a function of , while Figure 1(b) shows the corresponding absolute error . For convenience, Figure 1(a) also includes the line which is to converge to as . An eye examination of these figures suggests that the distribution of nearly unit-mean exponential, even though is as low as . The agreement with the asymptotic exponential distribution is even better when is larger.
Let us now keep at but increase the GSR statistic’s headstart to . Since is only , setting to half that is bringing much closer to , thereby aiding the former to hit the latter sooner (on average). Put another way, increasing the headstart is, in some sense, akin to lowering the threshold . As a result, the asymptotic exponentiality might not “kick in” as fast. This is exactly what we see in Figures 2(a) and 2(b) which show the obtained results for , , , and . Again, Figure 2(a) also includes the line , but this time around the deviation of from is noticeable with a naked eye. This is evidence that the asymptotic exponentiality isn’t quite there yet, and it is a direct consequence of the higher headstart value .
However, if we keep at but increase to , the distribution will get better aligned with the limiting exponential distribution, as can be seen from Figures 3 which show the results for , , , and . Looking at Figures 3(a) and 3(b) we see that is fairly close to being a unit-mean exponential random variable.
. oncluding Remarks
As was mentioned in the introduction, the obtained result, namely Theorem 2.1 which we proved explicitly, is the continuous-time equivalent of a similar result obtained earlier by Pollak and Tartakovsky (2009) in the discrete-time setting; see also Tartakovsky et al., (2008). On a practical level, we were able to confirm experimentally that the GSR stopping time is approximately exponential even if the detection threshold is fairly low. Pollak and Tartakovsky (2009) made the same observation in the discrete-time setting. Although we already elaborated in the introduction on the significance of our findings to theoretical change-point detection, it is also worth adding that some of the new special functions identities utilized in the paper may prove useful in other areas as well, e.g., in stochastic processes, stochastic differential equations, mathematical physics, and mathematical finance, where special functions arise quite often.
cknowledgement
The author’s effort was partially supported by the Simons Foundation via a Collaboration Grant in Mathematics under Award # 304574.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Abramowitz and Stegun, (1964) Abramowitz, M. and Stegun, I. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 10th edition, Washington, DC: United States Department of Commerce, National Bureau of Standards.
- 3Borodin and Salminen, (2002) Borodin, A.N. and Salminen, P. (2002) Handbook of Brownian Motion—Facts and Formulae , 2nd edition, Basel: Birkhäuser.
- 4Buchholz, (1969) Buchholz, H. (1969). The Confluent Hypergeometric Function , New York: Springer. Translated from German into English by H. Lichtblau and K. Wetzel.
- 5Burnaev, (2009) Burnaev, E. V. (2009). On a Nonrandomized Change-Point Detection Method Second-Order Optimal in the Minimax Brownian Motion Problem, in Proceedings of X All-Russia Symposium on Applied and Industrial Mathematics (Fall open session) , October 1–8, Sochi, Russia (in Russian).
- 6Burnaev et al., (2009) Burnaev, E. V., Feinberg, E. A., and Shiryaev, A. N. (2009). On Asymptotic Optimality of the Second Order in the Minimax Quickest Detection Problem of Drift Change for Brownian Motion, Theory of Probability and Its Applications 53: 519–536.
- 7Cattiaux et al., (2009) Cattiaux, P., Collet, P., Lambert, A., Martínez, S., Méléard, S., and Martín, J. S. (2009). Quasi-Stationary Distributions and Diffusion Models in Population Dynamics, Annals of Probability 37: 1926–1969.
- 8Collet et al., (2013) Collet, P., Martínez, S., and Martín, J. S. (2013). Quasi-Stationary Distributions: Markov Chains, Diffusions and Dynamical Systems , New York: Springer.
