AR(1) processes driven by second-chaos white noise: Berry-Ess\'een bounds for quadratic variation and parameter estimation
Soukaina Douissi, Khalifa Es-Sebaiy, Fatimah Alshahrani, Frederi G., Viens

TL;DR
This paper investigates the asymptotic properties of quadratic variation in AR(1) processes driven by second-chaos white noise, providing Berry-Esséen bounds and insights into parameter estimation.
Contribution
It introduces new bounds on convergence rates for AR(1) processes driven by second-chaos noise and applies these to improve understanding of mean-reversion estimation.
Findings
Established Berry-Esséen bounds for quadratic variation
Demonstrated convergence rates to normal law
Provided simulation validation of theoretical results
Abstract
In this paper, we study the asymptotic behavior of the quadratic variation for the class of AR(1) processes driven by white noise in the second Wiener chaos. Using tools from the analysis on Wiener space, we give an upper bound for the total-variation speed of convergence to the normal law, which we apply to study the estimation of the model's mean-reversion. Simulations are performed to illustrate the theoretical results.
| Mean | Std dev | Mean | Std dev | Mean | Std dev | Mean | Std dev | |
|---|---|---|---|---|---|---|---|---|
| 0.2178 | 0.0901 | 0.2887 | 0.0969 | 0.4946 | 0.0520 | 0.6962 | 0.0234 | |
| 0.1878 | 0.0866 | 0.2905 | 0.0616 | 0.4966 | 0.0413 | 0.6978 | 0.0270 | |
| 0.1630 | 0.0692 | 0.2928 | 0.0852 | 0.4974 | 0.0315 | 0.6987 | 0.0215 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
AR(1) processes driven by second-chaos white noise: Berry-Esséen
bounds for quadratic variation and parameter estimation
Soukaina Douissi1, Khalifa Es-Sebaiy2, Fatimah Alshahrani3, Frederi G. Viens4.
1 Laboratory LIBMA, Faculty Semlalia, Cadi Ayyad University, 40000 Marrakech, Morocco.
Email: [email protected]
2 Department of Mathematics, Faculty of Science, Kuwait University, Kuwait.
Email: [email protected]
3 Department of mathematical science, Princess Nourah bint Abdulrahman university, Riyadh.
3,4 Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824.
Email: [email protected], [email protected]
Abstract: In this paper, we study the asymptotic behavior of the quadratic variation for the class of AR(1) processes driven by white noise in the second Wiener chaos. Using tools from the analysis on Wiener space, we give an upper bound for the total-variation speed of convergence to the normal law, which we apply to study the estimation of the model’s mean-reversion. Simulations are performed to illustrate the theoretical results.
Key words: Central limit theorem; Berry-Esséen; Malliavin calculus; parameter estimation; time series; Wiener chaos
2010 Mathematics Subject Classification: 60F05; 60H07; 62F12; 62M10 ††footnotetext: The first author is supported by the Fulbright joint supervision program for PhD students for the academic year 2018-2019 between Cadi Ayyad University and Michigan State University. The fourth author is partially supported by NSF awards DMS 1734183 and 1811779, and ONR award N00014-18-1-2192.
1 Introduction
The topic of statistical inference for stochastic processes has a long history, addressing a number of issues, though many difficult questions remain. At the same time, a number of application fields are anxious to see some practical progress in a selection of directions. Methodologies are sought which are not just statistically sound, but stand a good chance of being computationally implementable, if not nimble, to help practitioners make data-based decisions in stochastic problems with complex time evolutions. In this paper, which is motivated by parameter estimation within the above context, we propose a quantitatively sharp analysis in this direction, and we honor the scientific legacy of Prof. Larry Shepp.
Prof. Shepp is widely known for his seminal work on stochastic control, optimal stopping, and applications in areas such as investment finance. Often labeled as an applied probabilist, by those working in that area, he had the merit, among many other qualities, of showing by example that research activity in this area could benefit from an appreciation for the mathematical aesthetics of constructing stochastic objects for their own sake. His papers also showed that one’s work is only as applied as one’s ability to calibrate a stochastic model to a realistic scenario. As obvious as this view may seem, it is nonetheless in short supply in some current circles, where model sophistication seems to replace all other imperatives. Instead, we believe some of the principles guiding applied probability research should include (i) statistical parsimony and robustness, (ii) feature discovery, and above all, (iii) real-world impact where mathematicians propose a real solution to a real problem. We think that Prof. Shepp would not have been shy about agreeing that his seminal and highly original works on optimal stopping and stochastic control [45], [SControl], including the invention [46] of the widely used Russian financial option, illustrate items (ii) and (iii) in this philosophy perfectly. This leaves the question of how to estimate model parameters needed to implement applied solutions. Prof. Shepp proved on many occasions that this concern was also high on his list of objectives in applied work; he proposed methods aligning with our stated principle (i) above. The best example is the work for which Shepp is most widely known outside of our stochastic circles: the mathematical foundation of the Computational Tomography (CT) scanner, and in particular, the basis [47] for its data analysis. Prof. Shepp is less well known for his direct interest in statistically motivated stochastic modeling; the posthumous paper [17] is an instance of this, on asymptotics of auto-regressive processes with normal noise (innovations).
Our paper honors this legacy by providing a detailed and mathematically rigorous stochastic analysis of some building blocks needed in the data analysis of a simple class of stochastic processes. Our paper’s originality is in working out detailed quantitative properties for auto-regressive processes with innovations in the second Wiener chaos. Our framework is parsimonious in the sense of being determined by a small number of parameters, while covering features of stationarity, mean-reversion, and heavier-than-normal tail weight. We focus on establishing rates of convergence in the central limit theorem for quadratic variations of these processes, which we are then able to transfer to similar rates for the model’s moments-based parameter estimation. This precision would allow practitioners to determine the validity and uncertainty quantification of our estimates in the realistic setting of moderate sample size. Careless use of a method of moments would ignore the potential for abusive conclusions in this heavy-tailed time-series setting.
The remainder of this introduction begins with an overview of the landscape of parameter estimation for stochastic processes related to ours. The few included references call for the reader to find additional references therein, for the sake of conciseness. We then introduce the specific model class used in this paper. It represents a continuation of the current literature’s motivation to calibrate stochastic models with features such as stochastic memory and path roughness. It constitutes a departure from the same literature’s focus on the framework of Gaussian noise.
1.1 Parameter estimation for stochastic processes: historical and
recent context
Some of the early impetus in parameter estimation for stochastic processes was inspired by classical ideas from frequentist statisitics, such as the theoretical and practical superiority of maximum likelihood estimation (MLE), over other, less constrained methodologies, in many contexts. We will not delve into the description of many such instances, citing only the seminal account [26], first published in Russian in 1974 (see references therein in Chapter 17, such as [34]). This was picked up two decades later in the context of processes driven by fractional Brownian motion, where it was shown that the martingale property used in earlier treatments was not a necessary ingredient to establish the properties of such MLEs: see in particular the treatment of processes with fractional noise in [24] and in [51]. It was also noticed that least-squares ideas, which led to MLEs in cases of white-noise driven processes, did not share this property in the case of processes driven by fractional noise: this was pointed out in the continuous-time based paper [22]. See also a more detailed account of this direction of work in [16] and references therein, including a discussion of the distinction between estimators based on continuous paths, and those using discrete sampling under in-fill and increasing-horizon asymptotics. These were applied particularly to various versions of the Ornstein-Uhlenbeck process, as examples of processes with stationary increments and an ability to choose other features such as path regularity and short or long memory.
The impracticality of computing MLEs for parameters of stochastic processes in these feature-rich contexts, led the community to consider other methodologies, looking more closely at least squares and beyond. A popular approach is to work with incarnations of the method of moments. A full study in the case of general stationary Gaussian sequences, with application to in-fill asymtotics for the fractional Ornstein-Uhlenbeck process, is in [4]. This paper relates the relatively long history of those works where estimation of a memory or Hölder-regularity parameter uses moments-based objects, particularly quadratic variations. It also shows that the generalized method of moments can, in principle, provide a number of options to access vectors of parameters for discretely observed Gaussian processes in a practical way. This was also illustrated recently in [18], where the Malliavin calculus and its connection to Stein’s method was used to establish speeds of convergence in the central-limit theorems for quadratic-variations-based estimators for discretely observed processes. The Stein-Malliavin technical methodology employed in [18] is that which was introduced by Nourdin and Peccati in 2009, as described in their 2012 research monograph [32].
Other estimation methods are also proposed for general stationary time series, which we mention here, though they fall out of the scope of our paper, and they do not lead to the same precision as those based on the Stein-Malliavin method: see e.g. [53] and [54] for the Yule-Walker method and extensions. While the paper [52] establishes that essentially every continuous-time stationary process can be represented as the solution of a Langevin (Ornstein-Uhlenbeck-type) equation with an appropriate noise distribution, the two aforementioned follow-up papers [53, 54], which present an analog in discrete time, do not, however, connect the discrete and continuous frameworks via any asymptotic theory.
Following an initial push in [51], most of the recent papers mentioned above, and recent references therein, state an explicit effort to work with discretely observed processes. At least in the increasing-horizon case, the papers [18] and [13] had the merit of pointing out that many of the discretization techniques used to pass from continuous-path to discrete-observation based estimators, were inefficient, and it is preferable to work directly from the statistics of the discretely observed process. Our paper picks up this thread, and introduces a new direction of research which, to our knowledge, has not been approached by any authors: can the asymptotic normality of quadratic variations and related estimators, including very precise results on speeds of convergence, be obtained when the driving noise is not Gaussian?
The main underlying theoretical result we draw on is the optimal estimation of total-variation distances between chaos variables and the normal law, established in [31]. It was used for quadratic variations of stationary sequences in the Gaussian case in [29]. But when the Gaussian setting is abandonned, the result in [31] cannot be used directly. Instead, our paper makes a theoretical advance in the analysis on Wiener space, by drawing on a simple idea in the recent preprints [37] and [33]; our main result provides an example of a sum of chaos variables whose distance to the normal appears to be estimated optimally, whereas a standard use of the Schwartz inequality would result in a much weaker result. The precise location of the technique leading to this improvement is pointed out in the main body of our paper: see Theorem 8, particularly inequality (24) in its proof and the following brief discussion there, and Remark 9. This allows us to prove our Berry-Esséen-type speed of , rather than what would have resulted in a speed of
1.2 A stationary process with second-chaos noise, and related
literature
Given our intent to address the new issue of noise distribution, and knowing that Berry-Esséen-type questions for models with mere Gaussian noise already present technical challenges, we choose to minimize the number of technical issues to address in this paper by focusing on the simplest possible stationary model class which does not restrict the marginal noise distribution within a family which is tractable using Wiener chaos analysis and tools from the Malliavin calculus. This is the auto-regressive model of order 1 (a.k.a. AR(1)) with independent noise terms, where the noise distribution is in the second Wiener chaos, i.e.
[TABLE]
where is an i.i.d. sequence in the second Wiener chaos, and and are constants. The complete description and construction of this process and of the noise sequence is given in Section 3, see (8).
As explained in Section 2, the second Wiener chaos is a linear space, and since the model (1) is linear, its solution, if any, lies in the same chaos. This points to a simple theoretical motivation and justification for studying the increasing horizon problem as opposed to the in-fill problem. We also include a practical motivation for doing so, further below in this section, coming from an environmental statistics question.
For the former motivation, note that the AR(1) specification (1), with essentially any square-integrable i.i.d. noise distribution, is known to converge weakly, after appropriate aggregation and scaling, to the so-called Ornstein-Uhlenbeck process (also known occasionally as the Vasicek process), which solves the stochastic differential equation
[TABLE]
where is a standard Wiener process (Gaussian Brownian motion), and the parameters , and are explicitly related to and . See for instance [48, Chapter 2], which covers the case of all square-integrable innovations; this paper assumes a piecewise linear interpolation in the normalization, which could be eliminated by switching to convergence in the Skorohod topology. A reference avoiding linear interpolation, with convergence in the Skorohod topology, is [10], where innovations are assumed to have four moments. In any case, this central limit theorem constrains the modeling of stationary/ergodic processes via diffusive differential formulation: under in-fill asymptotics with weakly dependent noise, the AR(1) specification cannot preserve any non-normal noise distribution in the limit. It is of course possible to interpolate the above process in a number of ways, to result in a continuous-time process whose discrete-time marginals are those specified via (1).
However we believe it is difficult or impossible to give a linear diffusion-type stochastic differential equation, akin to (2), whose fixed-time-step marginals are as in (1) for an arbitrary noise distribution, while simultaneously describing what second-chaos process differential would need to replace in (2). The so-called Rosenblatt process (see [50]), the only known second-chaos continuous-time process with a stochastic calculus similar to ’s, gives an example of a viable alternative to (2) living in the second chaos. But this process is known to have only a long-memory version. Thus it cannot be a proxy for any continuous-time analogue of (1), since the noise there has no memory. Similar issues would presumably exist for other non-Gaussian AR(1) and related auto-regressive processes. A few have been studied recently in the literature. We mention [21, 55], which cover various noise structures similar to second-chaos noises, and here again, no asymptotic or interpolation theory is provided to relate to continuous time. There does exist a general treatment in [17] of asymptotics for all AR() processes: the limit processes are the so-called Continuous-AR() processes, which are Gaussian, and have -differentiable paths (a form of very long memory for ); that paper assumes normal innovations to keep technicalities to a minimum.
Another indication that finding such a proxy may fail comes in the specific case of the so-called Gumbel distribution for . This law is a popular distribution for extreme-value modeling. The fact that this law is in the second chaos is a classical result (as a weighted sum of exponentials, see [44]), though it does not appear to be widely know in the extreme-value community. The standard (mean-zero) Gumbel law can be represented as where is a standard exponential variable (chi-squared with two degrees of freedom, and are iid standard normals). The Gumbel law is known to give rise to a second-chaos version of an isonormal Gaussian process, known as the Gumbel noise measure (or Gumbel process); that stochastic measure obeys the same laws as the white-noise measure (including independence of increments which fails for the Rosenblatt noise), if one replaces the standard algebra of the reals by the max-plus algebra. This is explained in detail in the preprint [27]; also see references therein. By virtue of this change of algebra, stochastic differential specifications as in (2) cannot be defined using the Gumbel noise.
However, the discrete version of the Gumbel noise, an i.i.d. sequence with Gumbel marginals, is a good example of a noise type which can be used in the AR(1) process (1). This specific model, known as the AR(1) process with Gumbel noise (or innovations), presents a main motivation for our work. Recent references on this process, and on the closely related process where the marginals of are Gumbel-distributed, include [28] for a Bayesian study, [49] for applications to maxima in greenhouse gas concentration data, and [3] for AR processs in the broader extreme-value context. A survey on AR(1) models with different types of innovations and marginals, while not including the Gumbel, is in the unpublished manuscript [20]. The use of the Gumbel distribution for describing environmental time series, mainly when looking at extremes, is fairly widespread, but we do not cite this literature because it does not appear willing to acknowledge that time-series models driven by Gumbel innovations should be used, rather than using tools for i.i.d. Gumbel data. This literature, which is easy to find, is also entirely unaware that the Gumbel distribution is in the second Wiener chaos.
All these reasons give us ample cause to investigate the basic method-of-moments building blocks for determining parameters in stationary time series with second-chaos innovations. For the sake of concentrating on the core mathematical analysis towards this end, we focus on the asymptotics of quadratic variations for models in the class (1). The methodology developed in [18] can then be adapted to handle any method-of-moments-based estimators, at the cost of some additional effort. We provide examples of this in the latter sections of this paper. Our main result is that, for any second-chaos innovations in (1), the quadratic variation of has explicit normal asymptotics, with a speed of convergence in total variation which matches the classical Berry-Esseén speed of .
The remainder of this paper is structured as follows. Section 2 provides elements from the analysis on Wiener space which will be used in the paper. Section 3 presents the details of the class of AR(1) models we will analyze. Section 4 computes the asymptotic variance of the AR(1)’s quadratic variation by looking separately at its 2nd-chaos and 4th-chaos components, whose asymptotics are of the same order. Section 5 establishes our main result, the Berry-Esseén speed of convergence in total-variation for the normal fluctuations of the AR(1)’s quadratic variation. Finally, Section 6 defines a method-of-moments estimator for the mean-reversion rate of this AR(1) process, and establishes its asymptotic properties; a numerical study is included to gauge the distance between this renormalized estimator and the normal law.
2 Preliminaries
In this first section, we recall some elements from stochastic analysis that we will need in the paper. See [32], [39], and [40] for details. Any real, separable Hilbert space gives rise to an isonormal Gaussian process: a centered Gaussian family of random variables on a probability space such that . In this paper, it is enough to use the classical Wiener space, where , though any will also work. In the case , can be identified with the stochastic differential of a Wiener process and one interprets .
The Wiener chaos of order is defined as the closure in of the linear span of the random variables , where and is the Hermite polynomial of degree . The intuitive Riemann-sum-based notion of multiple Wiener stochastic integral with respect to , in the sense of limits in , turns out to be an isometry between the Hilbert space (symmetric tensor product) equipped with the scaled norm and the Wiener chaos of order under ’s norm. In any case, we have the following fundamental decomposition of as a direct sum of all Wiener chaos.
The Wiener chaos expansion. For any , there exists a unique sequence of functions such that
[TABLE]
where the terms are all mutually orthogonal in and
[TABLE]
Product formula and contractions. Since is closed under multiplication, the special case of the above expansion exists for calculating products of Wiener integrals, and is explicit using contractions: for any , , and symmetric integrands and ,
[TABLE]
see [39, Proposition 1.1.3] for instance; the contraction is the element of defined by
[TABLE]
The special case for is particularly handy, and can be written in its symmetrized form:
[TABLE]
Hypercontractivity in Wiener chaos. For , the multiple Wiener integrals , which exhaust the set , satisfy a hypercontractivity property (equivalence in of all norms for all ), which implies that for any (i.e. in a fixed sum of Wiener chaoses), we have
[TABLE]
It should be noted that the constants above are known with some precision when : by Corollary 2.8.14 in [32], .
Malliavin derivative and other operators on Wiener space. The Malliavin derivative operator , and other operators on Wiener space, are needed briefly in this paper, to provide an efficient proof of the first theorem in Section 5, and to interpret an observation of I. Nourdin and G. Peccati, given below in (7), for a bound on the total variation distance of any chaos law to the normal law. We do not provide any background on these operators, referring instead to Chapter 2 in [32], and briefly mentioning here the facts we will use in the proof of Section 5, without spelling out all assumptions. Strictly speaking, all the results in this paper can be obtained without the following facts, but this would be exceedingly tedious and wholly nontransparent.
- •
The operator maps to and is consistent with the ordinary chain rule. Its domain is denoted by and includes all chaos variables.
- •
The operator , known as the generator of the Orstein-Uhlenbeck semigroup on Wiener space, maps to , and denotes its pseudo-inverse: ’s kernel is the constants, all other chaos are its eigenspaces. Combining this with the previous point, we obtain .
- •
This has an adjoint in , which by definition satisfies the duality relation , where is any stochastic process for which the expressions are defined. The domain of is a non-trivial object of study, but it is known to contain all square-integrable -adapted processes for the case of , the wiener process, where .
- •
We have the relation
[TABLE]
Distances between random variables. The following is classical. If are two real-valued random variables, then the total variation distance between the law of and the law of is given by
[TABLE]
where the supremum is over all Borel sets. The Kolmogorov distance is the same as except one take the sup over of the form for all real . The Wasserstein distance uses Lipschitz rather than indicator functions:
[TABLE]
being the set of all Lipschitz functions with Lipschitz constant .
The observation of Nourdin and Peccati. Let denote the standard normal law. The following observation relates an integration-by-parts formula on Wiener space with a classical result of Ch. Stein.
Let with and . Then (see [31, Proposition 2.4], or [32, Theorem 5.1.3]), for ,
[TABLE]
and by combining this with properties of solutions of Stein’s equations, one gets
[TABLE]
One notes in particular that when , since , we obtain conveniently
[TABLE]
A convenient lemma. The following result is a direct consequence of the Borel-Cantelli Lemma (the proof is elementary; see e.g. [25]). It is convenient for establishing almost-sure convergences from convergences.
Lemma 1
Let . Let be a sequence of random variables. If for every there exists a constant such that for all ,
[TABLE]
then for all there exists a random variable which is almost such that
[TABLE]
for all . Moreover, for all .
3 The model
3.1 Definition
We consider the following AR(1) model
[TABLE]
where , and are real constants. The sequence of innovations is i.i.d., with distribution in the second Wiener chaos. It turns out that this sequence can be represented as in the second line above in (8), where the family are i.i.d. standard Gaussian random variables defined on , and is a sequence of reals satisfying
[TABLE]
This is explained in [32, Section 2.7.4]. We assume that the mean reversion parameter is such that . Under this condition, (8) also admits a stationary ergodic solution. Both the version above and the stationary version are linear functionals of elements of the form of , which are elements of the second Wiener chaos. Since this chaos is a vector space, both versions of take values in the second Wiener chaos.
By truncating the series in (8), one obtains a process which is a sum of chi-squared variables, converging to in . Special cases where the sum is finite, can be considered. In the figures below, we simulate 500 observations from such cases, to show the variety of behaviors, even with a limited number of terms in the noise series.
- •
When and , for all , corresponds to a scaled mean-zero chi-squared white noise with one degree of freedom: .
- •
When and , , an exponential white noise with rate parameter . Indeed .
- •
When , and , for all , which is a symmetric second chaos white noise, ’s law is equal to a product normal law: if are two i.i.d. standard normals, then .
Remark 2
We can see from the figures above the asymmetry in figures (a), (b) and (c) due to the asymmetric nature of the noise; figure (d) shows more symmetry because of the choice . We also notice that when the mean reversion is fairly strong and the noise is large the shape of the observations is balanced (figure (a)), while when the noise is larger compared to the mean-reversion parameter, the observations look like an Ornstein-Uhlenbeck process with a noise larger than the drift (see figure (b)).
3.2 Quadratic variation
This paper’s main goal is to determine the asymptotic distribution of the quadratic variation of the observations using analysis on Wiener space.
This will be facilitated by the fact, mentioned above, that the sequence lives in the second Wiener chaos with respect to the Wiener process , by virtue of being the solution of a linear equation with noise in the second chaos. To be more specific, observe that in (8) can be expressed recursively as follows :
[TABLE]
where
[TABLE]
For the sake of ease of computation in Wiener chaos, it will be convenient throughout this paper to refer to the Wiener integral representation of the noise terms . For this, there exists an orthonormal family for which . Hence, using the fact, which comes from the most elementary application of the product formula (4), that , we have for :
[TABLE]
Therefore, using the linearity property of multiple integrals, we can write for ,
[TABLE]
where
[TABLE]
A straightforward computation shows that under Assumption (9), the kernel for all : indeed
[TABLE]
Our main object of study in the next two sections is the asymptotics of the quadratic variation defined as follows :
[TABLE]
Using product formula (4), we get
[TABLE]
In the next section, we show that the asymptotic variance of exits and we will compute its speed of convergence. Then we establish a CLT for , and compute its Berry-Esséen speed of convergence in total variation.
4 Asymptotic variance of the quadratic variation
Using the orthogonality of multiple integrals living in different chaos, to calculate the limiting variance of , we need only study separately the second moments of the terms and given in (14).
4.1 Scale constant for
Proposition 3
Under Assumption (9), with as in (14), for large ,
[TABLE]
where . In particular
[TABLE]
Proof.
We have , by the isometry property (3) of multiple integrals, we get
[TABLE]
Moreover, under Assumption (9), we have
[TABLE]
Therefore, for such that , we get
[TABLE]
Therefore, by (17), we have
[TABLE]
Moreover,
[TABLE]
On the other hand
[TABLE]
Consequently
[TABLE]
The desired is therefore obtained. ∎
4.2 Scale constant for
Proposition 4
Under Assumption (9), with as in (14), for large ,
[TABLE]
where
[TABLE]
In particular
[TABLE]
Proof.
By definition of the term , we have
[TABLE]
where denotes the symmetrization of , because the kernel is no longer symmetric. We deal with symmetrization by using a combinatorial formula, obtaining
[TABLE]
Therefore
[TABLE]
Moreover,
[TABLE]
On the other hand, using (9), we have for
[TABLE]
Therefore by (20), we get
[TABLE]
Now let us estimate
[TABLE]
For , , we have
[TABLE]
So, for ,
[TABLE]
Therefore,
[TABLE]
which completes the proof. ∎
To get a sense of how the two terms and compare to each other, we propose the following example, which shows that, despite one’s best efforts, one should not expect either of these two terms to dominate the other.
Remark 5
In the AR(1) model (8) with chi-squared white noise, i.e. when and for all , one can try to compare the two formulas for the asymptotic variances of and . Avoiding the situation where is very close to , assuming for instance , so that , when is large, we have
[TABLE]
Therefore the sequence can be made to have a variance which is significantly smaller that the one of in this case, but both of them converge to zero at the same speed .
Using the orthogonality between and , Proposition 3 and Proposition 4, we conclude the following.
Theorem 6
Under Assumption (9), with as in (13), for large ,
[TABLE]
and in particular the asymptotic variance of is
[TABLE]
where , , and , are given respectively in (15), (16), (18) and (19).
Remark 7
- •
From the previous theorem, we notice that for large, and fixed values of the noise scale parameter family , the variance of has high values when is close to 1, and approaches
[TABLE]
when is small.
- •
The previous theorem also shows that one can obtain other asymptotics depending on the relation between and the family . For instance, when is close to 1, which is the limit of fast mean reversion, one can avoid an explosion of ’s asymptotic variance by scaling the variance parameters appropriately, leading to a fast-mean reversion and small noise regime. Letting , where is interpreted as a rate of mean reversion, one would only need to ensure that and . In the example where there is a single non-zero value , for instance, we would obtain for large ,
[TABLE]
here the second term dominates, and as , assuming remains bounded, we would get an asymptotic variance of if the limit exists.
5 Berry-Esséen bound for the asymptotic normality of the
quadratic-variation
In this section, we prove that the quadratic variation defined in (13) is asymptotically normal and we estimate the speed of this convergence in total variation distance, showing it is of the Berry-Esséen-type order . For this aim, we will need the following theorem, which estimates the total variation distance to the normal of the standardized sum of variables in the 2nd and 4th chaos.
Theorem 8
Let where and . Then
[TABLE]
Moreover, letting be the bracketed term on the right-hand side of (22), for any constant , we have
[TABLE]
Proof.
We have . Then
[TABLE]
Thus, using , we can write . Now we use the result of a simple calculation, labeled as (9) in the preprint [33] (see also [37]), to obtain
[TABLE]
where the last equality comes from the duality relation . The prior inequality appears to be used in a more general context here than what is stated in [33, Eq. (9)], but an immediate inspection of its proof therein shows that it applies to any situation where , using only general results such as Stein’s lemma, the chain rule for the Malliavin derivative , and the duality between and .
On the other hand, using the product formula (4),
[TABLE]
Thus
[TABLE]
where
[TABLE]
Therefore, using Minkowski inequality,
[TABLE]
Furthermore,
[TABLE]
Also,
[TABLE]
and
[TABLE]
As a consequence,
[TABLE]
This, combined with (24), establishes inequality (22).
[TABLE]
Inequality (23) follows using inequalities (25) and (22). ∎
We will now use Theorem 8 to prove that the quadratic variation satisfies the following Berry-Esséen theorem.
Remark 9
It turns out that, when applying Theorem 8 to estimate the speed of convergence in the CLT for , the term cannot merely be bounded above via Schwarz’s inequality. See Lemma 13 below and its proof. This is the key element which allows us to obtain the Berry-Esséen speed in the next theorem.
Theorem 10
With defined in (13), under Assumption (9), we have for all
[TABLE]
where
[TABLE]
where , , are defined in the previous section in (16), (19), and , , , , and are given in the lemmas below, respectively in (15), (18), (29), (30) and (34).
In particular is asymptotically Gaussian, namely
[TABLE]
Proof.
Based on the decomposition of given in (14), we have
[TABLE]
where
[TABLE]
Applying Theorem 8 to , we get
[TABLE]
We study first the contractions of the kernels and given in (27) and we prove that they satisfy the following lemmas.
Lemma 11
If Assumption (9) holds, the kernel defined in (27) satisfies
[TABLE]
where .
Proof.
We have
[TABLE]
Moreover, by above calculations and (9)
[TABLE]
Hence
[TABLE]
Similarly
[TABLE]
Therefore
[TABLE]
Consequently
[TABLE]
where we used the change of variables . The desired result therefore follows. ∎
Lemma 12
If Assumption (9) holds, for every the kernel defined in (27) satisfies
[TABLE]
where
[TABLE]
Proof.
For , we have
[TABLE]
For , we get
[TABLE]
By (9), for all ,
[TABLE]
Similarly, for all ,
[TABLE]
On the other hand, for all
[TABLE]
Hence, by (9), for all ,
[TABLE]
Therefore, from (31) and above calculations, we have
[TABLE]
where we used the change of variables , , .
For , we have
[TABLE]
Hence, by (32) and (9), we get
[TABLE]
where we used the change of variables , , .
For , we have
[TABLE]
where, we used (32) and the change of variables , , , which ends the proof. ∎
Lemma 13
Suppose Assumption (9) holds. Consider the kernels and defined in (27), then we have
[TABLE]
where .
Proof.
We have
[TABLE]
Consequently, using (33), we get
[TABLE]
where we used the change of variables . ∎
The bound (26) is then a direct consequence of inequality (28) and the estimates given respectively in (29), (30), (34) and Theorem 6. ∎
6 Application: estimation of the mean-reversion parameter
In this section, to illustrate the implications of Theorem 10 in parameter estimation in an easily tractable case, we consider that we have observations coming from a specific version of our second-chaos AR(1) model (8), that which is driven by a chi-squared white noise with one degree of freedom:
[TABLE]
where , and are real constants, and are i.i.d. standard normal. This is model (8) where all ’s are zero except for the first one.
Proposition 14
The quadratic variation defined in (13) for model (35) satisfies, for all ,
[TABLE]
where .
Proof.
From the definition of in (13), we have by the isometry property of multiple integrals
[TABLE]
∎
Remark 15
Assuming that is known, Proposition (14) shows that the quadratic variation is an asymptotically unbiased estimator for , and thus, after a transformation, for as well:
[TABLE]
Therefore, using the fact that can be estimated via , we suggest the following moment estimator for the mean-reversion rate
[TABLE]
where
[TABLE]
6.1 Properties of the estimator
Proposition 16
The estimator of the mean reversion parameter defined in (36) is strongly consistent, namely almost surely
[TABLE]
Proof.
We write , with . According to Proposition (6), we have , . Hence, there exists a constant , such that
[TABLE]
Hence, by Lemma 1 we have almost surely , as . On the other hand, by Proposition 14, , as . Thus almost surely as , as announced. ∎
Proposition 17
Under Assumption (9), the estimator defined in (36) satsifies
[TABLE]
where and are given in Propositions 3 and 4 respectively and .
In particular is asymptotically Gaussian; more precisely we have as
[TABLE]
where
[TABLE]
Proof.
For , the function defined in (37) has an inverse . Let denote , for all . On the other hand, we can write
[TABLE]
Therefore, from the properties of the Wasserstein metric and denoting , we get
[TABLE]
where we used the bound (26) and Proposition 14 for the above bounds. On the other hand, since the function defined in (37) is a diffeomorphism and since , then by the mean-value theorem, there exists a random variable such that
[TABLE]
We have
[TABLE]
According to (38), the last term is bounded by the speed . Moreover, applying the mean-value theorem again since is twice continuously differentiable, there exists a random variable , such that
[TABLE]
where we used Hölder’s inequality with , are two reals greater than 1 such that . Moreover, by the hypercontractivity property for multiple integrals (5), there exists a constant such that
[TABLE]
where we used the inequality , for all , and the bounds of Proposition 14 and Theorem 6 respectively. On the other hand, for all ,
[TABLE]
Therefore using (39) and (40), to obtain a bound for the term , it remains to show that is finite for some but using the fact that and the monotonicity of , it is actually sufficient to show that for some , we have
[TABLE]
The function has two singularities in [math] and in and thus is not bounded. But, we can write
[TABLE]
For the term and since , we can pick such that . Then , therefore is bounded away from [math] and the term has no singularity for any . For the term , we put , the constant , because , we can assume , because there is no AR(1) process with . Therefore, we can pick such that . In this case , hence the term has no singularities at for any . In conclusion, to avoid the singularities at both [math] and , it is sufficient to pick such that
.
For the other term, by the asymptotic normality of , we get as and for
[TABLE]
which gives the desired result. ∎
6.2 Numerical Results
The table below reports the mean and standard deviation of the proposed estimator defined in (36) of the true value of the mean-reversion parameter .
We simulate the values of the estimator from the quadratic variation for different sample sizes and for fixed chosen to be equal to 1. For each sample size , the mean and the standard deviation are obtained by 500 replications. The table above confirms that the estimator is strongly consistent even for small values of and has small standard deviations for different true values of . Moreover, the estimator is more efficient for values of greater than , this could be explained by the fact that the asymptotic variance of limiting law of is , which is high for small values of and small for values of close to 1. Therefore, is presumably more accurate as an estimator when is closer to 1, e.g. greater than 0.5 as can be seen in the figure below.
To investigate the asymptotic distribution of empirically, we need to compare the distribution of the following statistic
[TABLE]
with the standard normal distribution . For this aim, for parameter choices , , , and based on 3000 replications, we obtained the following histogram:
This Figure 3 shows that the normal approximation of the distribution of the statistic is reasonable even if the sampling size is not very large. The table below compares statistics of and (0,1) based on 3000 replications, with , and . The empirical mean, median and standard deviation of match those of (0,1) very closely, corroborating our theoretical results.
[TABLE]
We can check more precisely how fast is the statistic converges in law to (0,1). We chose to compute the Kolmogorov distance between and . For this aim, we approximate the cumulative distribution function using empirical cumulation distribution function based on 500 replications of the computation of for . The next figure shows the empirical and standard normal cumulative distribution functions.
The Kolmogorov distance between the two laws, which equals the sup norm of the difference of these cumulative distribution functions, computes to approx. . On the other hand, since (See for example Theorem 3.3 of [9] for a proof)
[TABLE]
the distance on the left-hand side should be bounded above by approx times any constant coming from the upper bound in Proposition 17. This is five times larger than our estimate of the actual Kolmogorov distance , a reassuring practical confirmation of Proposition 17, and of our underlying results on normal asymptotics of 2nd-chaos AR(1) quadratic variations. If that proposition’s upper bound with its rate applied directly to the Kolomogorov distance, as is known to be the case for the Berry-Esséen theorem in the classical CLT, the value should be compared to approx., which is arguably in the same order of magnitude. This is a motivation to investigate whether the so-called delta method which we used here to prove Proposition 17 under the Wasserstein distance, could also apply to the total variation distance, since it is known to be an upper bound on the Kolmogorov distance without the need for the square root as in the comparison (42).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Azmoodeh, E. and Morlanes, G. I. (2013). Drift parameter estimation for fractional Ornstein-Uhlenbeck process of the second kind. Statistics . DOI: 10.1080/02331888.2013.863888.
- 2[2] Azmoodeh, E. and Viitasaari, L. (2015). Parameter estimation based on discrete observations of fractional Ornstein-Uhlenbeck process of the second kind. Statist. Infer. Stoch. Proc. 18 , no. 3, 205-227.
- 3[3] Balakrishna, N. and Shiji, K. (2014). Extreme Value Autoregressive Model and Its Applications, Journal of Statistical Theory and Practice , 8 (3), 460–481.
- 4[4] Barboza, L.A. and Viens, F. (2017). Parameter estimation of Gaussian stationary processes using the generalized method of moments. Electron. J. Statist. 11 (1), 401-439.
- 5[5] Belfadli, R., Es-Sebaiy, K. and Ouknine, Y. (2011). Parameter Estimation for Fractional Ornstein-Uhlenbeck Processes: Non-Ergodic Case. Frontiers in Science and Engineering (An International Journal Edited by Hassan II Academy of Science and Technology) . 1 , no. 1, 1-16.
- 6[6] Biermé, H., Bonami, A., Nourdin, I. and Peccati, G. (2012). Optimal Berry-Esséen rates on the Wiener space: the barrier of third and fourth cumulants. ALEA 9 , no. 2, 473-500.
- 7[7] Brouste, A. and Iacus, S. M. (2012). Parameter estimation for the discretely observed fractional Ornstein-Uhlenbeck process and the Yuima R package. Comput. Stat. 28 , no. 4, 1529-1547.
- 8[8] Cheridito, P., Kawaguchi, H. and Maejima, M. (2003). Fractional Ornstein-Uhlenbeck processes, Electr. J. Prob. 8 , 1-14.
