Controlling the time discretization bias for the supremum of Brownian Motion
Krzysztof Bisewski, Daan Crommelin, Michel Mandjes

TL;DR
This paper investigates the bias from time discretization in estimating Brownian motion crossing probabilities, proposing threshold-dependent grids that improve efficiency and reduce bias compared to equidistant grids.
Contribution
It introduces a novel threshold-dependent discretization method that reduces the number of grid points needed, making bias correction more efficient and broadly applicable.
Findings
Threshold-dependent grids require fewer points, independent of threshold b.
The proposed algorithm is strongly efficient for estimating crossing probabilities.
Empirical results show significant performance improvements over equidistant grids.
Abstract
We consider the bias arising from time discretization when estimating the threshold crossing probability , with a standard Brownian Motion. We prove that if the discretization is equidistant, then to reach a given target value of the relative bias, the number of grid points has to grow quadratically in , as grows. When considering non-equidistant discretizations (with threshold-dependent grid points), we can substantially improve on this: we show that for such grids the required number of grid points is independent of , and in addition we point out how they can be used to construct a strongly efficient algorithm for the estimation of . Finally, we show how to apply the resulting algorithm for a broad class of stochastic processes; it is empirically shown that the threshold-dependent grid significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Controlling the time discretization bias
for the supremum of Brownian Motion
Krzysztof Bisewski111Email: [email protected]
Centrum Wiskunde & Informatica, Amsterdam
Daan Crommelin
Centrum Wiskunde & Informatica, Amsterdam
Korteweg de Vries Institute for Mathematics, University of Amsterdam
Michel Mandjes
Korteweg de Vries Institute for Mathematics, University of Amsterdam
(September 13, 2017 )
Controlling the time discretization bias
for the supremum of Brownian Motion
Krzysztof Bisewski111Email: [email protected]
Centrum Wiskunde & Informatica, Amsterdam
Daan Crommelin
Centrum Wiskunde & Informatica, Amsterdam
Korteweg de Vries Institute for Mathematics, University of Amsterdam
Michel Mandjes
Korteweg de Vries Institute for Mathematics, University of Amsterdam
(September 13, 2017 )
Abstract
We consider the bias arising from time discretization when estimating the threshold crossing probability , with a standard Brownian Motion. We prove that if the discretization is equidistant, then to reach a given target value of the relative bias, the number of grid points has to grow quadratically in , as grows. When considering non-equidistant discretizations (with threshold-dependent grid points), we can substantially improve on this: we show that for such grids the required number of grid points is independent of , and in addition we point out how they can be used to construct a strongly efficient algorithm for the estimation of . Finally, we show how to apply the resulting algorithm for a broad class of stochastic processes; it is empirically shown that the threshold-dependent grid significantly outperforms its equidistant counterpart.
{textblock}
0.80.5,0.5 Submitted on March 17, 2017
1 Introduction
Extreme values of random processes play a prominent role in a broad range of practical problems. It is often of interest to find the tail of the distribution of the supremum of a continuous-time stochastic process over a finite time interval. In this paper the focus is on the level crossing probability
[TABLE]
For many classes of processes, such as the Gaussian processes [Adler, 1990], typically no explicit expressions for are available, with Brownian Motion and the Ornstein-Uhlenbeck process being notable exceptions. When an explicit expression for is unavailable one usually resorts to using high-dimensional numerical integration and simulation-based methods, see e.g. [Genz and Bretz, 2009] for further reading.
For most of the available numerical methods, the underlying continuous-time process needs to be discretized in time. One chooses a certain finite grid and then approximates with w_{T}(b):=\mathbb{P}\big{(}\sup_{t\in T}X_{t}>b\big{)}. We note that this always leads to an underestimation, i.e., . We quantify this underestimation by , the relative discretization bias222As , both and tend to [math], so that the absolute bias is not a meaningful accuracy measure.. Typically is chosen to be an equidistant grid and in that case, can be reduced only by changing the grid size . The finer the grid, the smaller the bias, but also, the larger the computational effort to estimate . The main drawback of using equidistant grids is that typically, to reach a given target value of the discretization bias, the grid size has to grow with the threshold . In that case, for large , the appropriate grid size can become so large that the computation is not feasible. Two central questions arise from these observations: How fast does have to grow in ? Furthermore, can we identify a more efficient family of grids?
In this paper we address these issues for standard Brownian Motion. Although in this case can be computed explicitly, there are no available expressions for . We conduct a thorough study of the influence of the choice of the grid on the corresponding relative bias. Furthermore, we argue that exploring the case of standard Brownian Motion is a first step towards finding efficient grids for a more general class of processes. We demonstrate numerically how our analysis of efficient grids for Brownian Motion leads to a useful procedure to determine efficient grids for a broad range of other processes.
The contributions of this paper are the following. (i) The first finding can be seen as a negative result: we show that to uniformly control333In this context uniform control means that for a fixed , we have that for all ; the grid can change in . the relative bias, the size of the equidistant grid must grow at least quadratically in ; see Theorem 1 in Section 3. (ii) The second finding is that we can do much better by using a threshold-dependent family of grids, meaning that grid points change their location with (but the number of points does not increase). The discretization bias induced by this particular family of grids is uniformly controlled without having to increase the number of grid points; see Theorem 2 in Section 4. According to the best of the authors’ knowledge, this is the first result which shows that a careful choice of the grid can drastically increase the accuracy of the discrete estimator of . Using threshold-dependent grids makes it feasible to estimate with moderate grid sizes even for very high thresholds , which would be impossible to estimate using equidistant grids. In particular, in Section 5 we present a strongly efficient algorithm for the estimation of that relies on threshold-dependent grids. (iii) In the third place, we point out how the ideas underlying our threshold-dependent grid can be used for a broad class of stochastic processes (including Gaussian processes, such as fractional Brownian Motion, and Lévy processes); it is empirically shown that the threshold-dependent grid significantly outperforms its equidistant counterpart.
An efficient grid (both small in size and inducing a small discretization bias) is particularly relevant for situations with large . In this respect, the work presented here connects to the rare event simulation literature. As approaches infinity, decays exponentially to [math] and standard simulation-based methods like Crude Monte Carlo to estimate become extremely time consuming. We emphasize that rare event simulation methods commonly aim to control the sampling error, not the bias due to the discretization. [Adler et al., 2012] develop an algorithm that is strongly efficient (with bounded relative sampling error) for estimation of (rather than ). We will show that combining their algorithm with the use of threshold-dependent grids provides a strongly efficient algorithm for estimation of .
A topic closely related to ours concerns the quantification of the difference between the supremum of the stochastic process taken over and the supremum taken over a finite grid , i.e.
[TABLE]
There are several results in the literature that study the behavior of for standard Brownian Motion. [Asmussen et al., 1995] shown that for the equidistant grids , has a tight, non-degenerate weak limit, as and [Janssen and Van Leeuwaarden, 2009] derived an expansion for . For random grids , where are i.i.d. uniform samples on , independent of the Brownian Motion , [Calvin and Glynn, 1997] establish the weak limit of . Finally, [Calvin, 1997] proposed a class of adaptive grids, meaning that the consecutive grid-points are chosen based on ; given any , an adaptive grid is provided such that has a weak limit.
In our study we do not focus on the difference between the values of the maxima of the discrete and continuous-time Brownian Motion, but rather on the , i.e., the relative difference between the probabilities that these maxima lie above a certain fixed threshold.
There are several approaches to tackle the discretization bias available in the literature. Arguably, the most widely applicable method is Multilevel Monte Carlo (MLMC) [Giles, 2008]. It can be applied together with any numerical method that relies on discretization. The idea is to use several different levels of discretization and spend less computational effort (draw less samples) at the finest levels of discretization. MLMC effectively reduces the computational effort, and the time saved can be used to produce even finer levels of discretization. It could be interesting to explore the combination of MLMC method together with the idea of threshold-dependent grids but further exploiting this procedure lies beyond the scope of this article.
One of the methods that aims to directly decrease the bias induced by equidistant grids is continuity correction. Since the discrete-time approximation is always smaller than , one could slightly lower the threshold to compensate for the underestimation. [Broadie et al., 1997], using the machinery developed in [Siegmund, 1985], proposed a way of lowering the threshold which improves the rate of convergence of the relative bias from , cf. Proposition 1, to , as the number of grid points grows large. However, in the non-Brownian case, it remains a non-trivial problem how much should be decreased. In fact, there is no direct way of making sure whether lowering decreases the absolute relative bias, as lowering by too much leads to overcompensation and thus to an estimate that is larger than . By contrast, it is straightforward to compare the bias induced by two different grids — the larger the discrete estimator , the smaller the relative bias.
There are also several simulation-based algorithms that do not rely on pre-discretization. [Li and Liu, 2015] propose a strongly efficient algorithm for estimation of for a large class of Gaussian processes (most prominently, processes with constant variance function). However, when the underlying process has a unique point of maximal variance (such as Brownian Motion), the algorithm requires the simulation of a random time from a density , which becomes a rare event simulation problem when is large. While for an arbitrary process, the random discretization proposed in the algorithm requires a computational effort cubic in the number of grid points (in order to simulate a discrete Gaussian path), pre-discretization requires only quadratic effort; see the discussion in Section 5.
This paper is organized as follows. Section 2 provides definitions, preliminaries, and develops a general intuition. In Section 3 we introduce useful upper and lower bounds for the discretization bias (see Lemma 1) and show that the number of points on the equidistant grid has to grow quadratically in the threshold in order to uniformly control the discretization bias. In Section 4, as an alternative to equidistant grids, we study threshold-dependent grids, which control the relative bias with a constant grid size, independently of . The proofs of all lemmas and a proposition are postponed to Section 8. In Section 5 we present an algorithm by [Adler et al., 2012], that we use throughout the paper for producing the numerical results; combining this algorithm with the use of threshold-dependent grids yields a strongly efficient algorithm for estimation of , see Corollary 1. In Section 6 we apply threshold-dependent grids developed in previous section to stochastic processes other than Brownian Motion: Brownian Motion with jumps, Ornstein-Uhlenbeck process and fractional Brownian Motion. Lastly, in Section 7 we present concluding remarks and discuss some ideas for future research of optimal grids. In the appendices we collect various technical results used throughout the paper.
2 Preliminary results
Let be a standard Brownian Motion on the time interval with . We consider the probability of crossing a positive threshold , that is
[TABLE]
For a standard Brownian Motion, an explicit formula for the threshold-crossing probability (1) is known, namely , which follows directly using the reflection principle (see e.g. [Mörters and Peres, 2010]). Given a finite grid we define a discrete-time approximation of :
[TABLE]
where is a finite subset of the interval , ordered such that . As we are mostly interested in choosing the grid efficiently, we define the following performance measure.
Definition 1**.**
Let be a finite grid on , then
[TABLE]
is called the relative bias induced by the grid .
The second representation of relative bias in Definition 1 is especially intuitive. It means that the relative bias is the probability that stays below on the grid , given that its supremum over is greater than . Notice that any grid which includes the endpoint will induce a relative bias no greater than . Indeed, if , then and thus
[TABLE]
Our objective is to accurately estimate using discrete approximations , in a computationally efficient manner. Brownian Motion has continuous paths and thus it is always possible for a given to find a fine enough grid to bound the bias up to a desired accuracy. However, the computational cost of estimating grows in the grid size and thus it might be infeasible to numerically compute for large grids.
At this point, we emphasize that we are not as much interested in the behaviour of for a fixed or a fixed but rather in asymptotic regimes in which and/or approach infinity. For every we allow to use a different grid so it seems natural to treat the grid as a function of threshold. For every we define a collection of grids of all possible sizes , where has elements, and we denote . For a given family of grids we are interested in behavior of as or tend to infinity. The most straightforward choice for the family of grids is the following.
Definition 2**.**
The family , where with is called the equidistant family of grids.
Notice that the location of grid points on the equidistant grid is independent of . Since the distance between neighboring points is equal to , and since Brownian paths are continuous, it follows that , as for any fixed . It has been established in [Asmussen et al., 1995] that for , equidistant grid, the difference between the continuous-time and discrete-time supremum is of order . More precisely, the sequence has a tight and non-degenerate weak limit.
Proposition 1**.**
Let denote standard Brownian Motion and be the equidistant family of grids from Definition 2 with . For any threshold there exist positive constants such that
[TABLE]
The proof of the Proposition 1 is given in Section 8. The proof we give strongly resembles the proof of Theorem 1 below in Section 3, but we remark that it is also possible to derive it using the tools developed in [Broadie et al., 1997].
Proposition 1 states that decays like , when grows large for a fixed but it does not describe the behavior of the relative bias when varies. In Theorem 1 in the following section, we derive an upper bound for for and simultaneously.
Figure 1 shows the evolution of the relative bias for four different thresholds against the size of the equidistant grid. Even though all four graphs show the decay, the graphs rise up with growing threshold. In particular, for thresholds and respectively and points are needed to arrive at around relative bias. It indicates that, as grows increasingly many grid-points are needed to arrive at the target relative bias. Using the threshold-dependent grid that we develop in Section 4 one can arrive at relative bias using approximately grid-points, independently of the value of the threshold. This amounts to a substantial improvement of the computational efficiency.
In some cases, the equidistant family of grids is the best possible choice, in the sense that other grid families require at least equally fast asymptotic growth of as increases, in order to control the relative bias. [Adler et al., 2012] prove that for centered, homogeneous and twice continuously differentiable (in a mean squared sense) Gaussian processes, has to grow linearly in to uniformly control the relative bias. Moreover, if grows sublinearly in , then the relative bias of any family of grids (not necessarily equidistant) tends to its maximal value, as approaches infinity. It is noted, however, that Brownian Motion does not belong to the family of Gaussian processes for which the result of [Adler et al., 2012] applies.
In the following two sections we analyze the asymptotic behavior of the relative bias for two families of grids. We prove that the equidistant grid requires quadratic growth of in (see Theorem 1 in Section 3). As an alternative, we develop the threshold-dependent family of grids, for which we prove that the relative bias can be made arbitrarily small, uniformly in for fixed (see Theorem 2 in Section 4). We obtain a uniform rate of convergence in and also provide a closed-form expression for the threshold-dependent family of grids (see Definition (9) in Section 4).
3 Equidistant family of grids for Brownian Motion
This section is devoted to analyzing the asymptotic behavior of the relative bias for the equidistant family of grids. The methodology developed in this section will be used later to prove Theorem 2; in particular, the crucial part of the proof concerns bounds for the relative bias induced by an arbitrary finite grid, developed in Lemma 1.
The following theorem describes the asymptotic behaviour of the relative bias, under the equidistant family of grids.
Theorem 1**.**
Let denote standard Brownian Motion and be the equidistant family of grids from Definition 2 with .
- (a)
Let be any positive, real number. There exist positive constants , independent of and such that
[TABLE]
for all , and
[TABLE]
for all . 2. (b)
Let be such that . Then, as ,
[TABLE]
Part (a) of Theorem 1 states that , so that in order to bound uniformly in it suffices to take . The second part of the Theorem 1 states that if then , meaning that the relative bias cannot be bounded by an arbitrarily small number. Together, the two parts entail that the growth is sufficient and there is no better (slower) growth which would guarantee a uniformly bounded relative bias.
The crucial part of the proof of Theorem 1 is the method of bounding the relative bias. Since no explicit expressions for or are known (even if is an equidistant grid) we develop a general upper bound for in the following lemma, in which we use the quantities
[TABLE]
Notice that in this definition of and we allow grid points to change their location with . In the present section, which is on equidistant grids, the grid points obviously do not depend on , but in later sections they do.
Lemma 1**.**
Let , where , and let . The following lower and upper bounds for apply:
[TABLE]
with
[TABLE]
A short proof of Lemma 1 is included in Section 8. The bounds consist of elements of two types: , the probability that stays negative at times , and , the probability that hits for the first time in the interval given that its supremum over is greater than .
For a general grid , the probabilities are difficult to control. However, when is equidistant (thus independent of ), then also the probabilities are independent of ; we emphasize this independence by writing instead of throughout this section. As a result, there exists a tight asymptotic bound for them (see Lemma 2 below); we were inspired to look into such quantities while reading [Mörters and Peres, 2010, Section 5]. The probabilities are controlled using a mean value theorem, see Appendix B.V.
Lemma 2**.**
There exist constants such that:
[TABLE]
for all .
In fact, the assertion in Lemma 2 is true for any symmetric random walk; see [Feller, 1971, Theorem 4 in Section XII.7, and Lemma 1 in Section XII.8]. Before proving Theorem 1 we present one more lemma.
Lemma 3**.**
Let be such that and let . Then the upper bound developed in Lemma 1 is an increasing function of .
An important implication of Lemma 3 is that for any we have that uniformly for all , which completely covers the statement on the situation that in part (a) of Theorem 1. The proof of Lemma 3 is given in Section 8.
Proof of Theorem 1(a).
Thanks to Lemma 3 it suffices to prove the first part of Theorem 1(a), i.e. we assume that . Without loss of generality we put . Exploiting the upper bound developed in Lemma 1 we decompose the sum into three parts, which we treat separately:
[TABLE]
Using the definition of the equidistant grid and the scaling property of Brownian motion we can see that a_{j}=\mathbb{P}\big{(}B_{t_{j}-t_{j-1}}<0,\ldots,B_{t_{n}-t_{j-1}}<0\big{)}=\mathbb{P}\big{(}B_{1}<0,\ldots,B_{n-j+1}<0\big{)} and the bound in Lemma 2 yields for all . Since all , we thus have a straightforward bound for the first term in (3):
[TABLE]
The second term we bound in the following fashion, relying on the upper bound that we have for (stated in Result B.V),
[TABLE]
where comes from Lemma 2 and is a positive constant, independent of and . To arrive at (4) we use that for all relevant . In the transition from (4) to (5) we use the definition of the Riemann sum for the function
[TABLE]
note that, since is an increasing function of when (see Result B.VI in the Appendix), the Riemann sum in (4) underestimates the integral, i.e., .
Lastly, since we have
[TABLE]
where is a positive constant independent of and . Since this results in
[TABLE]
Combining the above bounds,
[TABLE]
where is a positive constant independent of and . This concludes the proof. ∎
Proof of Theorem 1(b).
Without loss of generality we can assume as . Similar to the proof of Lemma 1 in Section 8 we obtain:
[TABLE]
Dividing both sides of the inequality by yields an elementary lower bound on :
[TABLE]
where denotes the standard normal cdf, and we use the fact that . In our case and , so that due to the monotonicity of
[TABLE]
Taking the limit on both sides of inequality (7) yields:
[TABLE]
where denotes the standard normal pdf, we use result B.II in (8), and the last equality is a consequence of the assumption that . ∎
In this section we have proven that in order to uniformly control the relative bias, the size of the equidistant grid must grow at least quadratically in , as approaches infinity. In the next section we present a threshold-dependent grid, which yields a uniform bound on the relative bias using a grid of given size. In other words, in order to control the relative bias with increasing , instead of adding more and more points to the grid, it suffices to suitably shift their location.
4 Threshold-dependent grids for Brownian Motion
In this section we prove the main result of the paper. We explicitly present a threshold-dependent family of grids which uniformly (in ) bounds the relative bias.
Before we introduce the result, we give some intuition why it is possible to control the relative bias as grows, without increasing . Firstly, for any given , we have that
[TABLE]
as . Therefore,
[TABLE]
It means that with growing , the ‘hitting of the threshold’ occurs closer and closer to time . It indicates that the grid points should be gradually shifted towards the point , as is increasing. Moreover, the result in Theorem 1 indicates how fast the points should be shifted. It states that for the family of equidistant grids, the uniform bound on the bias is achieved if the number of grid points grows quadratically in . Equivalently, the distances between neighboring points are decreasing proportionally to . It turns out that this is indeed the pace at which the points should be shifted towards .
In the following result, and denote the standard normal cdf and its inverse, respectively.
Theorem 2**.**
Let be a standard Brownian Motion. Fix and let be a family of grids such that ; here for , and
[TABLE]
for . Denote . There exists a positive , independent of and , such that
[TABLE]
for all .
We emphasize that the bound for the relative bias developed above does not depend on the threshold and thus holds uniformly, for all . Figure 2 shows the comparison between the relative bias of the equidistant and the threshold-dependent grid, both of size . The bias induced by the threshold-dependent grid remains uniformly bounded (by circa ), while the former tends to , the worst possible relative bias, cf. Theorem 1, part (b).
Notice that for small , in Theorem 2 is identical to the equidistant family of grids. In fact this is exactly the setting of the second part of the Theorem 1(a). The real contribution of Theorem 2 is the regime when . The grid defined in (9) is the unique solution to the set of equations
[TABLE]
for all and . To see this, we sum up the first equations in (10) and obtain an explicit equation for :
[TABLE]
Since for Brownian Motion it holds that
[TABLE]
and in particular , Eqn. (11) can be equivalently expressed as
[TABLE]
or, in terms of the cdf ,
[TABLE]
Finally, after taking the inverse from both sides of the equation above we see that satisfies (9). Figure 3 shows the placement of the grid-points on the grid , as defined in (9), for increasing . In fact, one can prove that
[TABLE]
and thus
[TABLE]
for large . It means that the points of the grid (9) are clustered around , with distances between the points proportional to . Here we see an important connection with Theorem 1(a), where the distances between grid-points decrease at the same pace, as already mentioned in the opening paragraph of this section.
For , the points of the threshold-dependent grid (9) do not coincide with the equidistant grid, entailing that we can not directly use Lemma 2 to control the terms of type in the upper bound developed in Lemma 1 in Section 3. The following lemma resolves this issue.
Lemma 4**.**
Let . Then
[TABLE]
for any , where
[TABLE]
A proof of this lemma is provided in Section 8. Lemma 2 applied to the upper bound in Lemma 4 yields a simple upper bound for \mathbb{P}\big{(}B_{t_{1}}>0,\ldots,B_{t_{n}}>0\big{)} for any choice of . In our case, after applying Lemma 1 we have to control probabilities of the type \mathbb{P}\big{(}B_{t_{j}-t_{j-1}}<0,\ldots,B_{t_{n}-t_{j-1}}<0\big{)}, and thus we need a lower bound on
[TABLE]
which we give in the following lemma.
Lemma 5**.**
For the grid in (9), for , and we have:
[TABLE]
and when additionally we have
[TABLE]
Lemma 5 is proven in Section 8. The lower bound in part (a) of Lemma 5 is in fact
[TABLE]
With these lemmas we can prove Theorem 2.
Proof of Theorem 2.
Part (a) of Theorem 1 states that for any choice of there exists positive such that for and thus also . Without the loss of generality, from now on we assume that . Fix and denote for notational simplicity. After combining the general upper bound from Lemma 1 with the equivalent definition (10) of the threshold-dependent grid (9) we obtain
[TABLE]
observe that in our setting Moreover, Lemma 4 yields (recalling the definition of )
[TABLE]
where
[TABLE]
Combining Lemma 2 with Lemma 5 gives
[TABLE]
with a constant that is independent of and . Notice that does not depend on . For we thus obtain
[TABLE]
where is a constant, independent from and , that might differ from line to line. In (14) we use the inequality and in (15) we use the convergence of the Riemann sum to the integral. This concludes the proof of Theorem 2. ∎
Remark 1**.**
For the purpose of showing that for any confidence level and bias , see also (16), the ‘equiprobable’ grid (as defined through (9)) requires a computational effort that is bounded in , it suffices that the decay of the upper bound for in Theorem 2 is of order ; see Corollary 1 in Section 5. As an aside we remark that we hypothesize that this decay is actually of order . This is supported by numerical experiments; see Figure 4 where plots of versus are shown for the threshold-dependent grid (9). The step we expect to be ‘loose’, in obtaining the bound of Theorem 2, is the one corresponding to Lemma 4. We conjecture that Lemma 4 is valid with
[TABLE]
(i.e., without the square root), which suffices to yield the decay of . **
5 Numerical algorithm for estimation of
As mentioned in the introduction, the family of threshold-dependent grids (9) can be used to construct a strongly efficient algorithm for estimation of , see Corollary 1 below. In this paper, by ‘strongly efficient’ we mean that for any given accuracy and confidence level the computational time of an estimator for that satisfies
[TABLE]
is bounded independently of the threshold .
In all numerical experiments throughout this paper we used an algorithm developed by [Adler et al., 2012], see Algorithm 1 below. Although it is applicable for estimation of quantities such as , where is normally distributed with an arbitrary positive-definite covariance matrix, we present their algorithm for the specific case of Brownian Motion, as considered in this paper.
Algorithm 1** ([Adler et al., 2012]).**
Choose a threshold and a finite grid . The estimator , computed according to the following algorithm, is an unbiased estimator of .
Generate a random time on the grid, i.e. , according to the law
[TABLE] 2. 2.
Generate under the condition . 3. 3.
Generate a discrete path of the Brownian Motion conditioned on the pair generated in the previous steps. 4. 4.
Compute
[TABLE]
[Adler et al., 2012] prove that the Algorithm 1 gives an unbiased estimator of (not of ) and that for a fixed (independent of ), the relative variance , as . The authors also propose an estimator for , which relies on a random discretization. However, with growing , one needs increasingly many random grid-points in order to control the relative bias, therefore the continuous-time algorithm is not strongly efficient. In order to reduce the sampling error one generates multiple replicas of the estimator and takes their average. Since every replica is based on a different grid, one must repeatedly calculate the Cholesky decomposition (whose computational time is cubic in the number of grid-points) in order to sample discrete Gaussian paths in Step 3 of Algorithm 1. Choosing a predefined grid speeds up this computation, as in that case the Cholesky decomposition has to be performed only once, making its computational cost negligible.
Combining the threshold-dependent grids as proposed in Section 4 with Algorithm 1 yields a strongly efficient estimator for which is given in the corollary below.
Corollary 1** (Strongly efficient algorithm for the estimation of ).**
Fix an accuracy and a confidence level . Choose a grid from the family of grids defined in such that for all (this is possible due to the result in Theorem 2). Let be i.i.d copies of the estimator from Algorithm 1, with
[TABLE]
Then
[TABLE]
satisfies
[TABLE]
and the computational effort to simulate is bounded independently of .
Proof.
First notice that since is uniformly bounded in (see Theorem 2), so that is fixed independently of , it follows that can be computed in bounded time, independently of . It remains to prove that satisfies the strong efficiency property (17). Note that is an unbiased estimator of , not of . The relative variance of with respect to can be bounded independently of for an arbitrary choice of the grid in terms of the grid size ,
[TABLE]
Due to Chebyshev’s inequality,
[TABLE]
This concludes the proof. ∎
We conclude this section by a remark on the simulation of the conditioned Brownian Motion in Step 3 of Algorithm 1. The naïve method would be to construct the covariance matrix of the conditioned process, calculate the Cholesky decomposition of that matrix (cubic in the number of grid points) and then simulate the process in a standard manner. Notice that this step must be repeated for every replica and thus its computational cost scales with the number of samples. The following algorithm, which can be found e.g. in [Doucet, 2010], requires only a single calculation of the Cholesky decomposition for all replicas.
Algorithm 2** ([Doucet, 2010]).**
Let , where and , be normally distributed with mean and covariance matrix ,
[TABLE]
The following algorithm generates a sample :
Sample 2. 2.
Compute .
Note that the computational effort to produce the conditioned Gaussian random variable in Step 2 of Algorithm 2 is linear in the dimension . Thus, this algorithm significantly reduces the computation time of Step 3 of Algorithm 1 when that step is repeated for each replica.
6 Efficient grids for a broad class of stochastic processes
In this section we discuss how the idea of threshold-dependent grids can be applied to stochastic processes other than Brownian Motion. We let be a real-valued stochastic process and . For simplicity we here assume that is continuous and strictly increasing so that (but situations in which can be dealt with similarly, see also the discussion in Section 7).
As argued in the previous sections, it is efficient to let the position of the grid points depend on . We constructed for Brownian Motion a grid by finding such that
[TABLE]
cf. (11). An inherent problem is that the class of processes for which the distribution of is known is very limited, so that the approach does not seem to be useful for relevant stochastic processes other than Brownian motion. We saw, however, that for Brownian Motion the satisfying (18) also solve
[TABLE]
cf. (12). The idea now is to use the level-dependent (or: ‘equiprobable’) grid (19) for general real-valued processes. The major advantage of the grid (19) is that to calculate the position of the grid points the sole prerequisite is that the process’ marginals are known (rather than the distribution of ). In addition, even if the marginal distributions of are not available, but the asymptotics of (as ) are, then a good approximation of this grid can be found. (In the sequel we write, for brevity, instead of ) and instead of )
We now provide the rationale behind the grid (19). Let be a grid such that . Evidently, by the union bound,
[TABLE]
Now notice that if the grid is such that for
[TABLE]
then it does not make sense to include the point for large . Property (20) clearly compromises the performance of equidistant grids as Considering however the grid points of the threshold-dependent grid, as defined by (19), these will by design not experience (20).
To assess the performance of the above threshold-dependent grid (19), we introduce a measure of performance closely related to the relative bias. Note that when no formulas for are available, nor it is known how to reliably approximate , we cannot determine the exact value of the relative bias. We now make the following two observations. (1) As for any choice of , the larger is, the better; if for grids , then also . (2) The crude lower bound provides us with a useful benchmark. Combining these two thoughts motivates the following performance measure of a grid :
[TABLE]
Notice that for any such that we have
[TABLE]
What is more, for any two grids we have if and only if ; this means that the bigger the is, the better. As our main aim is to efficiently approximate using discrete-time approximations , we see that if then there is little gain from using over a deterministic estimator .
In a series of examples we compare induced by (i) the threshold-dependent (equiprobable) grid and (ii) the equidistant grid of the same size; we consistently use grid points. In all cases is a continuous, strictly increasing function (so that ). The most important conclusion is that the experiments below uniformly indicate that the equiprobable grid outperforms the equidistant one, not only in the asymptotic regime, as threshold grows large, but already for moderate values of . This shows how the ideas the we developed earlier this paper, that have provable optimality properties for Brownian motion, lead to an efficient estimation procedure for a much broader class of stochastic processes. In all examples, we observe that induced by the equidistant grid converges to , thus the corresponding is asymptotically equivalent to , as .
Example 1** (Brownian Motion with jumps).**
Let be a Brownian Motion with jumps, i.e.
[TABLE]
where is a standard Brownian Motion and is a standard Poisson process with intensity .
Even though there are no closed-form expressions for , it is still possible to generate exact samples from (see [Dębicki and Mandjes, 2015, Section 10.1]). We can use this to construct an unbiased estimator of and thus can estimate the relative bias of the tested grids. The results in Figure 5 show the substantial gain achieved by the level-dependent grid. The graphs look similar to those of Brownian Motion, which is indicative of the threshold-dependent grid having a uniformly bounded relative bias.**
Example 2** (Ornstein-Uhlenbeck Process).**
Let be an Ornstein-Uhlenbeck process, i.e., a strong solution to the following SDE: with ,
[TABLE]
Then is a zero-mean Markovian Gaussian process with covariance function
[TABLE]
The exact value of is known only in terms of special functions, see [Alili et al., 2005] and it is not straightforwardly evaluated. However, the exact asymptotics of , as grows large, are known:
[TABLE]
where is a positive constant independent of , see e.g. [Dębicki and Mandjes, 2003, Theorem 5.1] or the original theorem by [Piterbarg and Prisyazhnyuk, 1978]; this explains why for the level-dependent grid goes to a constant in Figure 6. Again the equidistant grid is significantly outperformed by the threshold-dependent grid.**
Example 3** (Fractional Brownian Motion).**
Let be a fractional Brownian Motion (fBM) with a Hurst parameter , that is a zero-mean Gaussian process with the covariance function
[TABLE]
Observe that fBM with Hurst parameter is a standard Brownian Motion. For any we have (strictly increasing variance in time) and thus .
The exact value of the probability for remains unknown. However, like in Example 2, the exact asymptotics of are known:
[TABLE]
where is a constant only depending on ; we again refer to [Dębicki and Mandjes, 2003, Theorem 5.1] or the original theorem by [Piterbarg and Prisyazhnyuk, 1978]. We apply threshold-dependent grids in these two different asymptotic regimes for and , see the results in Figure 7. Again the threshold-dependent grid performs considerably better. In case the above asymptotic result explains why for the level-dependent grid keeps increasing ( behaves as the increasing function ). In case , again using the asymptotic result, as grows large, both for the equidistant grid and for the threshold-dependent grid (equivalently, the relative bias vanishes for both as ). Note however that with the threshold-dependent grid, tends to 1 slower than with the equidistant grid, as can be seen in Figure 7 (right panel), showing the more favorable performance of the threshold-dependent grid. **
7 Concluding remarks and discussion
In this paper we have demonstrated that the errors due to time discretization when estimating threshold-crossing probabilities can be significantly reduced by using other grids than the commonly used equidistant grid. We have analyzed this in considerable detail for the case of standard Brownian Motion. In particular, we have shown that in order to control the error as grows large, it suffices to properly shift the grid points instead of refining the grid with more and more points. At the same time, controlling the error using equidistant grids requires quadratic growth of the number of grid points, as grows large.
Numerical estimation is evidently not needed for Brownian Motion due to the availability of analytical results. Our paper however indicates that the underlying ideas can be used to construct efficient grids for a broad class of stochastic processes (notably, Lévy processes and Gaussian processes, such as fractional Brownian Motion). The results presented in this paper are intended to develop valuable insight and useful heuristics for tackling the estimation of tail probabilities of these more general classes of processes. We have demonstrated such heuristics for several processes in Section 6. There, we presented a procedure, that is empirically shown to work well for stochastic process of which the marginal distributions are known:
- (i)
Identify
[TABLE]
in case is a zero-mean Gaussian process, is a point of maximal variance, i.e., . As argued, for many key models we have that
- (ii)
Construct a grid clustered around it, such that solves (19), for .
As we pointed out, even if the marginal distribution of is not available but only the corresponding asymptotics, as , this procedure can be applied. It is also noted that it is straightforward to compare two different grids: the larger the value of , the closer it is to the target quantity .
A natural question that arises in relation to Theorem 2 is whether we can find a grid that is even better than the one defined in (9). Constructing an optimal n-grid , i.e. a grid of size that minimizes the relative bias for a given , remains elusive. However we have been able to find an explicit formula for an optimal 2-grid, namely , with
[TABLE]
where . For comparison, the threshold-dependent grid defined in (9) yields , hence the grid (9) is not minimizing the bias (although the difference with the optimal 2-grid is small). Additionally, we were able to prove that for an optimal n-grid, , the limits must exist, and are all finite and pairwise distinct. As a result we were able to numerically calculate the limit . Finding optimal grids for larger remains an open problem. We note, however, that with the threshold-dependent grid we can bound the relative bias uniformly in (see Theorem 2) and in this sense the grid (9) is already (asymptotically) optimal.
8 Proofs of Lemmas 1, 3, 4, 5 and Proposition 1
Proof of Proposition 1.
In part (a) of Theorem 1 it has been proven already that . Thus, when is fixed it is straightforward that the upper bound in the assertion of the theorem holds.
The lower bound developed in Lemma 1 reads . Since we have for the equidistant grid and all and are non-negative, we may use the weaker inequality
[TABLE]
In the following we use Lemma 2 for a lower bound on terms and Result B.V for a lower bound on .
[TABLE]
where is a positive constant independent of (but dependent on ) that may vary from line to line. To arrive at (22) we use the convergence of the Riemann sum, noting that is fixed and that the function
[TABLE]
is integrable on . This concludes the proof. ∎
Proof of Lemma 1.
Notice that the events and are equivalent. We thus find
[TABLE]
To prove the upper bound we use the fact that is a non-increasing function of (see Appendix A, Transformation T2), so that
[TABLE]
Dividing both sides of the inequality by gives . To prove the lower bound we use Result B.IV from the Appendix, so as to obtain
[TABLE]
Dividing both sides of the inequality by leads to \beta_{T}(b)\geq\underline{\beta}_{T}(b) and concludes the proof. ∎
Proof of Lemma 3.
Recall the definitions of and , and . Notice that if we put , then by the scaling property of Brownian Motion
[TABLE]
and thus (since the s are independent of , we abbreviate ).
Assume that for any there exists such that
[TABLE]
Since the weights must satisfy we have \sum_{j=1}^{n}\big{(}w_{j}(b_{2})-w_{j}(b_{1})\big{)}=0 and thus
[TABLE]
Finally,
[TABLE]
For the remainder of the proof we prove the existence of satisfying (24). Let be the first hitting time of level and let be the density of given that , i.e.,
[TABLE]
where , , and denotes the density of a standard normal random variable. We will prove that for any there exists such that:
[TABLE]
Then the weights
[TABLE]
are decreasing for all , and increasing for all . If is not an integer, it is not known whether increases or not, but for sure there exists satisfying (24). For the remainder we prove the existence of satisfying (25). For :
[TABLE]
Note that and (for example due to the Result B.VII in the Appendix) and that is strictly decreasing, hence has exactly one zero and for and for . The observation that concludes the proof. ∎
Proof of Lemma 4.
Let . We transform the grid with Transformations T1–T3, see Appendix A, in such a way that after all transformations we end up with .
Using Transformation T2, translate the grid to the right by , i.e., put
[TABLE] 2. 2.
Put , and . While do:
- •
Put .
- •
Using Transformation T3, contract the grid after time by a factor , where is defined by . Formally, we put
[TABLE]
Notice that after this operation .
- •
Put . 3. 3.
Using Transformation T1, delete all the points such that .
Now we prove that the algorithm is well-defined, more precisely, we confirm that all ’s exist. First, see that is well-defined. By induction, assume that is well-defined and prove that is well-defined as well. Notice that after the th loop in Step 2 of the algorithm, the distances between the points shrunk at most by a factor compared with the initial maximal distance . Moreover, we observe that
[TABLE]
We prove by induction that for all . Obviously . Assume that . After multiplying inequality (26) by we obtain
[TABLE]
which ends the inductive proof. Next, in order to show that is well defined for it suffices to prove that the endpoint , after the th loop of Step 2, is greater than . We prove a stronger statement, namely that the endpoint after being shrunk by a factor is still greater than , i.e. . By the definition of , satisfies the inequality , thus
[TABLE]
which concludes the proof that is well-defined. As all transformations used in steps 1-3 satisfy (30) we have
[TABLE]
We finish the proof by observing that \mathbb{P}\big{(}B_{h}>0,\ldots,B_{Nh}>0\big{)}=\mathbb{P}\big{(}B_{1}>0,\ldots,B_{N}>0\big{)}, due to the scaling property of Brownian Motion. ∎
Proof of Lemma 5.
Notice that the grid points defined in (9) depend only on the threshold and the ratio . We are able to extend the definition of to ,
[TABLE]
such that . Equivalently, can be defined as the unique solution to
[TABLE]
This extension makes it possible to inspect the derivative of with respect to the ratio . Using the extension function of , we aim to prove the more general statement that for ,
[TABLE]
Moreover, using the definition (27) we may substitute
[TABLE]
and arrive at another equivalent form of inequality (28):
[TABLE]
which is Result B.IX in the Appendix. For part (b) see that the density of the first hitting time,
[TABLE]
is an increasing function on the interval and thus part (b) follows from the second definition of the grid points in (10). ∎
Appendix A Grid transformations
Let with . We introduce three grid transformations, i.e. operations satisfying
[TABLE]
- (T1)
Deleting. For any
[TABLE] 2. (T2)
Translation to the right of the whole sequence. For any
[TABLE] 3. (T3)
Contraction of time after some point. For any and :
[TABLE]
Proof that Transformations T1-T3 satisfy (30).
Assertion T1 is straightforward to verify. Observe for T2 that
[TABLE]
and for T3 that
[TABLE]
∎
Appendix B Miscellaneous results
Let denote the standard normal cumulative distribution function and the standard normal density function. Below we list various results that we use. Results B.I–B.III are standard, and not proven here.
- (B.I)
For :
[TABLE] 2. (B.II)
As ,
[TABLE] 3. (B.III)
[Szarek and Werner, 1999]. For :
[TABLE] 4. (B.IV)
Let , then:
[TABLE] 5. (B.V)
Let , where , and , then:
[TABLE]
and
[TABLE]
for . 6. (B.VI)
Let such that
[TABLE]
Then is an increasing function of , when . 7. (B.VII)
Let such that
[TABLE]
Then is a strictly decreasing function. 8. (B.VIII)
Let such that
[TABLE]
Then is a strictly increasing function. 9. (B.IX)
Let be such that
[TABLE]
Then is continuous and increasing.
B.1 Proofs of results B.IV–B.IX
Proof of B.IV.
The proof is very similar to the proofs from Appendix A. Note that
[TABLE]
which concludes the proof. ∎
Proof of B.V.
Using the mean value theorem and monotonicity of on the negative half-line, we have for . Furthermore,
[TABLE]
Thus, for and , after substituting , the above combined with the inequality B.III yield:
[TABLE]
The proof of the second inequality is analogous. ∎
Proof of B.VI.
It suffices to prove that for . See that
[TABLE]
Note that has at most one root when thus for . Moreover, when , then (for ) thus is strictly decreasing for . From the observation that and we conclude that is nonnegative on the interval for and thus , when . ∎
Proof of B.VII.
We have that
[TABLE]
thus iff , which is equivalent to Result B.I. ∎
Proof of B.VIII.
See that
[TABLE]
thus iff , which is an implication of the lower bound from result B.III. ∎
Proof of B.IX.
It is easy to see that . To see that we expand \log\big{(}\Phi(-b/\sqrt{t})\big{)} in a series around and obtain
[TABLE]
Thus
[TABLE]
To prove that is increasing we study the first derivative. For :
[TABLE]
Due to Result B.VII we have the lower bound
[TABLE]
and thus the numerator of the fraction in (33) can be bounded from below by the function defined as below:
[TABLE]
Notice that implies which is exactly what we want to establish. For the remainder of the proof we show that is non-negative. Since and , it suffices to show that is monotone (non-increasing). We study the first derivative
[TABLE]
where the last inequality is a consequence of the application of Result B.VIII, that is
[TABLE]
is an increasing function of . ∎
Acknowledgments. The authors would like to thank Ankush Agarwal and Johan van Leeuwaarden for useful discussions and suggestions. In addition, the reviewers’ reports helped improving the quality of our work considerably. This work is part of the research programme ‘Rare Event Simulation for Climate Extremes’ with grant number 657.014.033, which is (partly) funded by the Netherlands Organisation for Scientific Research (NWO). Michel Mandjes’ research is partly funded by the NWO Gravitation Programme NETWORKS, grant number 024.002.003.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[Adler, 1990] Adler, R. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Processes . IMS Lecture Series. Institute of Mathematical Statistics.
- 2[Adler et al., 2012] Adler, R. J., Blanchet, J. H., and Liu, J. (2012). Efficient Monte Carlo for high excursions of Gaussian random fields. The Annals of Applied Probability , 22(3):1167–1214.
- 3[Alili et al., 2005] Alili, L., Patie, P., and Pedersen, J. L. (2005). Representations of the first hitting time density of an Ornstein-Uhlenbeck process. Stochastic Models , 21(4):967–980.
- 4[Asmussen et al., 1995] Asmussen, S., Glynn, P., and Pitman, J. (1995). Discretization error in simulation of one-dimensional reflecting Brownian motion. The Annals of Applied Probability , 5:875–896.
- 5[Broadie et al., 1997] Broadie, M., Glasserman, P., and Kou, S. (1997). A continuity correction for discrete barrier options. Mathematical Finance , 7(4):325–349.
- 6[Calvin, 1997] Calvin, J. M. (1997). Average performance of a class of adaptive algorithms for global optimization. The Annals of Applied Probability , 7:711–730.
- 7[Calvin and Glynn, 1997] Calvin, J. M. and Glynn, P. W. (1997). Average case behavior of random search for the maximum. Journal of Applied Probability , 34(3):632–642.
- 8[Dębicki and Mandjes, 2003] Dębicki, K. and Mandjes, M. (2003). Exact overflow asymptotics for queues with many gaussian inputs. Journal of Applied Probability , 40(3):704–720.
