Estimation of Stopping Times for Stopped Self-Similar Random Processes
Viktor Schulmann

TL;DR
This paper develops a non-parametric method to estimate the distribution of an unknown stopping time for self-similar processes, extending previous work on Brownian motion to broader classes like Bessel processes, with proven convergence rates and asymptotic properties.
Contribution
It introduces a new estimator for the stopping time distribution of self-similar processes, generalizing Mellin transform techniques beyond Brownian motion.
Findings
Derived the minimax convergence rate for the estimator.
Established asymptotic normality for Bessel processes.
Extended estimation methods to a wider class of self-similar processes.
Abstract
Let be a known process and an unknown random time independent of . Our goal is to derive the distribution of based on an iid sample of . Belomestny and Schoenmakers (2015) propose a solution based the Mellin transform in case where is a Brownian motion. Applying their technique we construct a non-parametric estimator for the density of for a self-similar one-dimensional process . We calculate the minimax convergence rate of our estimator in some examples with a particular focus on Bessel processes where we also show asymptotic normality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Estimation of Stopping Times for Stopped Self-Similar Random Processes
Viktor Schulmann
TU Dortmund
Abstract
Let be a known process and an unknown random time independent of . Our goal is to derive the distribution of based on an iid sample of . Belomestny and Schoenmakers (2015) propose a solution based the Mellin transform in case where is a Brownian motion. Applying their technique we construct a non-parametric estimator for the density of for a self-similar one-dimensional process . We calculate the minimax convergence rate of our estimator in some examples with a particular focus on Bessel processes where we also show asymptotic normality.
AMS Numbers: 62G07, 62G20, 60G18, 60G40.
Keywords: estimation of stopping times, multiplicative deconvolution, Mellin transform, self-similar process, Bessel process
1 Introduction
Belomestny and Schoenmakers (2015) considered the problem of recovering the distribution of an independent random time based on iid samples from a one-dimensional Brownian motion at time . Comte and Genon-Catalot (2015) already considered this problem for Poisson processes. Here we use the method of Belomestny and Schoenmakers (2015) and derive corresponding results for self-similar processes. We particularly focus on Bessel processes. As a consequence, we extend results from Belomestny and Schoenmakers (2015) to multi-dimensional Brownian motion. This is accomplished by considering the two-norm of the multi-dimensional Brownian motion, thus reducing the problem to the case of a Bessel process which is a one-dimensional process and can be treated similarly to the case of one-dimensional Brownian motion. More specifically, we give a non-parametric estimator for the density of . We show consistency of this estimator with respect to the risk and derive a polynomial convergence rate for sufficiently smooth densities . Moreover, we show that this rate is optimal in the minimax sense. The constructed estimator is also shown to be asymptotically normal.
The paper is organized as follows: In Section 2 we recapitulate the Mellin transform which is our main tool throughout this paper. Using this transform we construct our estimator in Section 3 by solving a multiplicative deconvolution problem which is related to the original problem through self-similarity of the underlying process. The use of the Mellin transform in multiplicative deconvolution problems proposed by Belomestny and Schoenmakers (2015) is different to the standard approach which consists in applying a log-transformation and thus reducing the problem to an additive deconvolution problem which is usually addressed by the kernel density deconvolution technique. In Section 4 we give bounds on bias and variance of the estimator in the general self-similar case. In the following two sections we lay our focus on Bessel processes and give the convergence rates of our estimator for this case (Section 5) and show its asymptotic normality (Section 6). Section 7 is devoted to two further examples of self-similar processes where our method yields consistent estimators. Their convergence rates are provided there. In the following Section 8 we show optimality in the minimax sense of all previously obtained rates. Some numerical examples are given in Section 9. Finally, we collect some of the longer proofs in Section 10.
2 Mellin Transform
In this section we recapitulate some properties of the Mellin transform from Butzer and Jansche (1997). This integral transform will be our main tool in estimation procedures of the next sections. For define the space
[TABLE]
If is the density function of an -valued random variable, then we have at least . Moreover, if is locally integrable on with
f(x)=\left\{\begin{array}[]{ll}\mathcal{O}(x^{-a}),&\text{for}~{}~{}x\rightarrow 0\\ \mathcal{O}(x^{-b}),&\text{for}~{}~{}x\rightarrow\infty\end{array}\right.,
then holds.
Definition 1**.**
For a densitiy function of a random variable define
[TABLE]
as the Mellin transform of (or of ) in with .
If , then is well defined and holomorphic on the strip according to Butzer and Jansche (1997).
Example 2**.**
- (i)
Consider gamma densities
[TABLE]
for . For all with we have
[TABLE] 2. (ii)
Consider for , the densities
[TABLE]
For elementary calculus shows that
[TABLE]
Similar to the well-known relation of the classical Fourier transform to sums of independent random variables, the Mellin transform behaves multiplicatively with respect to products of independent random variables:
Theorem 3**.**
Let and be independent -valued random variables with densities and , and Mellin transforms and for . Then hat a density , and
[TABLE]
for all with .
In the setting of Theorem 3 it is easy to see that is identical to
[TABLE]
for all with . The function is called Mellin convolution of and .
For denote the space of holomorphic functions on by . The mapping , is injective. Given the Mellin transform of a function we can reconstruct :
Theorem 4**.**
For let . If
[TABLE]
then the inversion formula
[TABLE]
holds almost everywhere for .
Another important result in the theory of Mellin transforms is the Parseval formula for Mellin tranforms (see (Bleistein and Handelsman, 1986, page 108) for the proof):
Theorem 5**.**
Let be measurable functions such that
[TABLE]
exists. Suppose that and are holomorphic on some vertical strip for . If there is a with
[TABLE]
then
[TABLE]
3 Construction of the Estimator
We consider a real-valued stochastic process with càdlàg paths which is self-similar with scaling parameter (for short, -ss), that is
[TABLE]
Here, denotes identity of all finite dimensional distributions. Let be a stopping time with density independent of . It is easy to see that the density of the random variable is given by
[TABLE]
where are the densities of (). In order to construct a non-parametric estimator for based on iid samples of we use the simple consequence of (5) that
[TABLE]
We take the absolute value on both sides and assume with and , so we can apply the Mellin transform on both sides of (7) and obtain
[TABLE]
for . Setting we conclude that
[TABLE]
If the Mellin inversion formula (Lemma 4) is applicable to , we may write
[TABLE]
for . Combining (8) and (9) we obtain the representation
[TABLE]
for . In order to obtain an estimator of based on (10) we would like to replace by its empirical counterpart
[TABLE]
However, this substitution may prevent the integral in (10) from converging. Thus, we introduce a sequence with (chosen later) in order to regularize our estimator. In view of (10) define
[TABLE]
for and as an estimator for .
4 Convergence Analysis
For the sake of brevity we introduce the notation for , if in the Landau notation. We write for , if and for . For and consider the class of densities
[TABLE]
where
[TABLE]
For the bias of the estimator (11) we have:
Theorem 6**.**
Let be -ss with càdlàg paths. Let be a stopping time independent of with density . If with and , then
[TABLE]
for all , .
Proof.
Let . By Fubini’s theorem and (8),
[TABLE]
We combine Theorem 4 with (13) to get
[TABLE]
Since implies for , (see Proposition 5 in Flajolet et al. (1995)), we have
[TABLE]
Moreover, (14) gives
[TABLE]
for all , which is our claim. ∎
Having established an upper bound on the bias of , we now shall do the same for the variance of our estimator.
Theorem 7**.**
Let be -ss with càdlàg paths. Let be a stopping time independent of with density . If with and , then
[TABLE]
for all and all .
Proof.
Let , . As
[TABLE]
for any bounded random function (continuous in ), we obtain
[TABLE]
In order to get a bound on we use the self-similarity of to get
[TABLE]
which (together with (16)) gives the desired bound on . ∎
5 Application to Bessel Processes
In this section we choose to be a Bessel process starting in [math] with dimension . Note that the case leads to the absolute value of the one-dimensional Brownian motion and was already considered in Belomestny and Schoenmakers (2015). We refer to Revuz and Yor (1999) for detailed information about Bessel processes. It is well-known, that Bessel processes are -ss and have continuous paths. Marginal densities are given by:
[TABLE]
In Example 2(ii) we calculated . Looking at (11) we obtain
[TABLE]
as an estimator for the density of a stopping time for and , where are such that and are independent samples of . With our major result Theorem 8 we shall derive the convergence rates for (17).
Theorem 8**.**
If for some , and if there is a with and , then
[TABLE]
for some depending only on as well as . Moreover, taking
[TABLE]
in (18), one has the polynomial convergence rate
[TABLE]
Proof.
Let . We use the upper bound on variance obtained in Lemma 7 with to get
[TABLE]
for some . By Example 2(ii) and Lemma 21(ii) we have
[TABLE]
for some constants and . Adding (22) and (12) gives
[TABLE]
for some . The choice (19) yields the rate (20). ∎
The class is fairly large. In particular, includes for all such well-known families of distributions as Gamma, Weibull, Beta, log-normal and inverse Gaussian. So, if belongs to one of those families, Theorem 8 is true for any . If , then we only require .
6 Asymptotic Normality for Bessel Processes
Note that the estimator (17) can be written as
[TABLE]
with
[TABLE]
Since is a sum of iid variables, we can show that (under mild assumptions on ) is asymptotically normal. In fact, we have:
Theorem 9**.**
Let for some . Suppose there is a such that , and for some and
[TABLE]
If we choose in (17) then we have
[TABLE]
for all , where
[TABLE]
with some given by (48).
We present the proof in Section 10.1. As we mentioned in the end of Section 5, we can often assume so that the choice of is only restricted by . If , then a suitable can always be found, for instance, any is valid. If additionally , then the statement is true for all .
It is possible to give a Berry-Esseen type error estimate for the convergence in (27). This is a new result even for dimension .
Theorem 10**.**
Let for some . Suppose there is a such that , , and (26) holds. Fix some . Denote by the distribution function of
[TABLE]
(where is defined by (17) and is given by (28)) and by the distribution function of the standard normal distribution. If we choose in (17) then we have
[TABLE]
for .
Proof.
Let and . Consider the representation (24) of . Berry-Esseen Theorem (see Gänssler and Stute (1977)) states
[TABLE]
We choose in Lemma 11 to get
[TABLE]
for . By Theorem 9 we have (28). Choose . Plugging (31) and (28) into (30) concludes the proof. ∎
Note that the signs of the powers and in (29) are ambiguous and depend on the relative positions of and . However, if then we only have the case and the power of the logarithm is positive.
The following observation about the absolute moments of is useful in the proof of Theorem 9 but also holds some insights in itself.
Lemma 11**.**
Let for some and with as . If there is a such that , and , then
[TABLE]
as for all . In particular, all absolute moments of exist for all greater than some .
Proof.
Case : By Jensen inequality, Lemma 21(ii) and (8) (with and there) we have
[TABLE]
where and . The case follows similarly applying Lemma 21(i) instead of (ii). ∎
For the special case and this result is mentioned in Belomestny and Schoenmakers (2015) but without an extensive proof which we provide here. Note that for the assumption is redundant. Moreover, we always have the smaller bound of the second case in (32).
7 Some Other Self-Similar Processes
7.1 Normally Distributed Processes
Let be -ss with càdlàg paths and standard normally distributed. As example consider a fractional Brownian motion. This setting is easily generalized to the case where with by considering the process and modifying our observations to . Taking in Example 2(ii) we see that estimator (11) assumes the form
[TABLE]
for and . We can prove a convergence result for this estimator, similar to Theorem 8.
Theorem 12**.**
Let . Suppose for some . If there is some , then
[TABLE]
for and all . Taking
[TABLE]
we obtain the polynomial convergence rate
[TABLE]
for .
Proof.
The proof is analogous to the one of Theorem 8 except for the upper bound on variance which is in this case
[TABLE]
for some . Combining this with the bound on the bias from Lemma 6(i) we obtain (34). Plugging (35) into (34) gives the rate (36). ∎
Taking in Theorem 12 we obtain the same rates as for Bessel processes (see Theorem 8). For smaller the rate is worse and for greater it is better. Note that we work with observations of rather than .
7.2 Gamma Distributed Processes
Let be -ss with càdlàg paths such that has Gamma density (1) with . We can easily generalize to the case , by considering the process and modifying our observations to . As an example consider the so-called square of a Bessel process with dimension starting at [math] (see (Revuz and Yor, 1999, Chapter XI, §1)). Considering Example 2(i) estimator (11) takes the form
[TABLE]
for and . We can prove a convergence result for this estimator, that is similar to Theorems 8 and 12.
Theorem 13**.**
Let . Suppose for some . If there is some with , then
[TABLE]
for and all . If
[TABLE]
then
[TABLE]
for , where .
Proof.
In this case he upper bound on variance becomes
[TABLE]
Rest is again analogue to the proof of Theorem 8. ∎
8 Optimality
The rates from Theorems 8, 12 and 13 are optimal in the minimax sense.
Theorem 14**.**
For all and there is such that
[TABLE]
for some , where infimum is over all estimators based on samples of with
- (i)
a Bessel process with dimension and ; 2. (ii)
a -ss. Gaussian process () and ; 3. (iii)
a -ss. Gamma distributed process () and .
See Section 10.2 for the proof of this theorem. A similar optimality result was obtained in Belomestny and Schoenmakers (2015) for the case where the absolute value of a one-dimensional Brownian motion is observed. (41) means that for each estimator , that we may construct with our observations, there is a true density such that
[TABLE]
for some , i.e. it is impossible to construct an estimator with a convergence rate (w.r.t. -distance) faster than for all and all .
9 Simulation Study
In this Section we test our estimator (17) with some simulated data. Consider a Bessel process with dimension and a Gamma distributed stopping time , i.e. has the density
[TABLE]
In order to evaluate the estimator (17) we choose . Take the cut-off parameter (in accordance with (19)) and . To choose small appears counterintuitive at first because we showed in Theorem 8 that the convergence rate is better for large . However, in our examples the choice delivers the best results. This can be explained as follows: Our bound on the bias of estimator contains the constant (see (15)) as a factor. This constant is growing in and seems to make a crucial contribution to the overall error. We refer to Belomestny and Schoenmakers (2015) and Schulmann (2019) for an alternative choice of based purely on the data.
In order to test the performance of we compute it based on 100 independent samples of of size . In Figure 1 we see the resulting box-plots of the loss.
Let us now demonstrate the performance of our estimator for different distributions of . As examples we consider Exponential, Gamma, Inverse-Gaussian and Weibull distributions. To construct the estimate (17) we choose , , and as before. Figure 2 shows the densities of the six distributions and their 50 respective estimates based on 50 independent samples of of size .
10 Proofs
10.1 Proof of Theorem 9
We roughly imitate the proof of an analogous result for the special case , found in Belomestny and Schoenmakers (2015). In distinction from Belomestny and Schoenmakers (2015) we do not restrict ourselves to the case in the proof and provide the specific form of for all .
Let . It suffices to show the Lyapunov condition, i.e. for a :
[TABLE]
The claim (27) follows from (42) with . Note that for by monotone convergence and (10) (if we choose there). So, (42) holds if we can prove, that and
[TABLE]
In any case of Lemma 11 (for ) we have
[TABLE]
for all and some . Now we investigate the asymptotic behavior of . Looking at (25) we use Fubini’s theorem to obtain
[TABLE]
By Example 2(ii) we can estimate
[TABLE]
for some and further
[TABLE]
Our strategy now is to decompose the double integral defining into pieces that are easy to estimate. To that end let , where and define
[TABLE]
By Lemma 20 there are such that
[TABLE]
and such that
[TABLE]
With the help of these inequalities we deduce
[TABLE]
Similarly,
[TABLE]
and
[TABLE]
for some . Combine (45) and (46) to obtain
[TABLE]
Next, we examine the asymptotic behavior of the integral . To this end, we take advantage of Stirling’s formula (Lemma 19)
[TABLE]
for . Consider the integrand of . In the denominator it holds by means of the identity that
[TABLE]
for . On the set
[TABLE]
we define , with , to obtain
[TABLE]
Note that due to the choice of , we have and . We use the asymptotic decomposition
[TABLE]
to obtain
[TABLE]
Analogously, on the set
[TABLE]
we define , with , to obtain
[TABLE]
Hence, can be decomposed as follows:
[TABLE]
where
[TABLE]
with
[TABLE]
The integral in (47) allows a series representation via Lemma 22. In fact,
[TABLE]
uniformly in . Thus,
[TABLE]
holds with
[TABLE]
Summing up the auxiliary quantities introduced above we get
[TABLE]
and thus (28). If , then (28) and (44) imply (43) and hence the claim.
10.2 Proof of Theorem 14
The basic construction used in this proof is due to Belomestny and Schoenmakers (2016), where it is used in the context of an observed Brownian motion. Define the -divergence
[TABLE]
between two probability measures and with densities and . The following general result forms the basis for the subsequent steps (see Tsybakov (2008) for a proof).
Theorem 15**.**
Let be family of probability measures indexed by a non-parametrical class of densities . Suppose that are iid observations in model with . If there are such that
[TABLE]
and if
[TABLE]
holds for some independent of , then
[TABLE]
holds for some , where the infimum is over all estimators.
Let and . Define for
[TABLE]
The following lemma provides some properties of the functions and .
Lemma 16**.**
The function is a probability density on with Mellin transform
[TABLE]
The Mellin transform of the function is given by
[TABLE]
Proof.
Formula (53) can be found in Oberhettinger (2012) and (54) is shown in (Belomestny and Schoenmakers, 2016, Lemma 6.2). ∎
Set now for any and some ,
[TABLE]
for , where is defined by (4). The following lemma will help us verify condition (50).
Lemma 17**.**
For any and some not depending on the function is a probability density satisfying
[TABLE]
Moreover, and are in for all and .
Proof.
For (56) see (Belomestny and Schoenmakers, 2016, Lemma 6.3) where it is also shown that for small enough:
[TABLE]
It is easy to see that for all and (57) implies the same for . ∎
Looking further towards applying Theorem 15 let us consider the densities and of an observation associated with the hypotheses and , respectively. At this point we have to differentiate between the models we discussed so far. We will only present the proof for the Bessel case, parts (ii) and (iii) of Theorem 14 can be showed along the same lines.
Let and be two random variables with respective densities and . The density of the random variable , is obtained via (6):
[TABLE]
For the Mellin transform of we use self-similarity of and (3) to get
[TABLE]
for and .
Lemma 18**.**
For all and we have
[TABLE]
Proof.
Define . By the change of variables ,
[TABLE]
with
[TABLE]
For the next step let . We apply Theorem 5 and the rule and obtain
[TABLE]
for suitable , where . Due to (54), we can estimate
[TABLE]
with . Next, we use Lemma 20 in (60) to estimate the gamma terms, then plug in (61) and for some to obtain
[TABLE]
[TABLE]
where is the dominating term. This proves the lemma. ∎
Lemma 18 implies (51). With the choice
[TABLE]
Lemma 17 implies (50). Claim of Theorem 14(i) follows with Theorem 15.
11 Appendix
For proof of Lemmas 19 and 20 we refer to Andrews et al. (1999).
Lemma 19**.**
For we have for
Lemma 20**.**
For all there are such that
[TABLE]
Corollary 21**.**
- (i)
For all and there is a such that
[TABLE]
- (ii)
For all and there are and with
[TABLE]
Proof.
Define . For Lemma 20 gives a such that
[TABLE]
which implies the claim with . The case follows similarly with and . ∎
Lemma 22**.**
Let . If is times continuously differentiable (), then we have the expansion
[TABLE]
Proof.
See (Erdélyi, 1956, page 47). ∎
Acknowledgement
The author was supported by the Deutsche Forschungsgemeinschaft (DFG) via RTG 2131 High-dimensional Phenomena in Probability – Fluctuations and Discontinuity.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Andrews et al. (1999) Andrews G, Askey R, Roy R (1999) Special Functions. Encyclopedia of Mathematics and its Applications, Cambridge University Press
- 2Belomestny and Schoenmakers (2015) Belomestny D, Schoenmakers J (2015) Statistical Skorohod embedding problem: Optimality and asymptotic normality. Statist Probab Lett 104:169 – 180
- 3Belomestny and Schoenmakers (2016) Belomestny D, Schoenmakers J (2016) Statistical inference for time-changed Lévy processes via Mellin transform approach. Stoch Process Their Appl 126(7):2092 – 2122
- 4Bleistein and Handelsman (1986) Bleistein N, Handelsman R (1986) Asymptotic Expansions of Integrals. Dover Publications
- 5Butzer and Jansche (1997) Butzer P, Jansche S (1997) A direct approach to the Mellin transform. J Fourier Anal Appl 3(4):325–376
- 6Comte and Genon-Catalot (2015) Comte F, Genon-Catalot V (2015) Adaptive laguerre density estimation for mixed poisson models. Electron J Stat 9:1113–1149
- 7Erdélyi (1956) Erdélyi A (1956) Asymptotic Expansions. Dover Books on Mathematics, Dover Publications
- 8Flajolet et al. (1995) Flajolet P, Gourdon X, Dumas P, Knuth DTD, Bruijn NGD, Mellin H (1995) Mellin transforms and asymptotics: Harmonic sums. Theor Comput Sci 144:3–58
