AR(1) processes driven by second-chaos white noise: Berry-Ess\'een   bounds for quadratic variation and parameter estimation

Soukaina Douissi; Khalifa Es-Sebaiy; Fatimah Alshahrani; Frederi G.; Viens

arXiv:1907.06782·math.PR·July 17, 2019

AR(1) processes driven by second-chaos white noise: Berry-Ess\'een bounds for quadratic variation and parameter estimation

Soukaina Douissi, Khalifa Es-Sebaiy, Fatimah Alshahrani, Frederi G., Viens

PDF

TL;DR

This paper investigates the asymptotic properties of quadratic variation in AR(1) processes driven by second-chaos white noise, providing Berry-Esséen bounds and insights into parameter estimation.

Contribution

It introduces new bounds on convergence rates for AR(1) processes driven by second-chaos noise and applies these to improve understanding of mean-reversion estimation.

Findings

01

Established Berry-Esséen bounds for quadratic variation

02

Demonstrated convergence rates to normal law

03

Provided simulation validation of theoretical results

Abstract

In this paper, we study the asymptotic behavior of the quadratic variation for the class of AR(1) processes driven by white noise in the second Wiener chaos. Using tools from the analysis on Wiener space, we give an upper bound for the total-variation speed of convergence to the normal law, which we apply to study the estimation of the model's mean-reversion. Simulations are performed to illustrate the theoretical results.

Tables1

	$\| a_{1} \| = 0.10$		$\| a_{1} \| = 0.30$		$\| a_{1} \| = 0.50$		$\| a_{1} \| = 0.70$
	Mean	Std dev	Mean	Std dev	Mean	Std dev	Mean	Std dev
$n = 3000$	0.2178	0.0901	0.2887	0.0969	0.4946	0.0520	0.6962	0.0234
$n = 5000$	0.1878	0.0866	0.2905	0.0616	0.4966	0.0413	0.6978	0.0270
$n = 10000$	0.1630	0.0692	0.2928	0.0852	0.4974	0.0315	0.6987	0.0215

Equations306

Y_{n} = a_{0} + a_{1} Y_{n - 1} + ε_{n}

Y_{n} = a_{0} + a_{1} Y_{n - 1} + ε_{n}

d X_{t} = α (m - X_{t}) d t + σ d W (t)

d X_{t} = α (m - X_{t}) d t + σ d W (t)

F = E [F] + n = 1 \sum \infty I_{n} (f_{n}),

F = E [F] + n = 1 \sum \infty I_{n} (f_{n}),

E [I_{n} (f_{n})^{2}] = n! ∥ f_{n} ∥_{H^{\otimes n}}^{2} .

E [I_{n} (f_{n})^{2}] = n! ∥ f_{n} ∥_{H^{\otimes n}}^{2} .

I_{p} (f) I_{q} (g) = r = 0 \sum p \land q r! C_{p}^{r} C_{q}^{r} I_{p + q - 2 r} (f \otimes_{r} g);

I_{p} (f) I_{q} (g) = r = 0 \sum p \land q r! C_{p}^{r} C_{q}^{r} I_{p + q - 2 r} (f \otimes_{r} g);

(f \otimes_{r} g) (s_{1}, \dots, s_{p - r}, t_{1}, \dots, t_{q - r})

(f \otimes_{r} g) (s_{1}, \dots, s_{p - r}, t_{1}, \dots, t_{q - r})

:= \int_{[0, 1]^{p + q - 2 r}} f (s_{1}, \dots, s_{p - r}, u_{1}, \dots, u_{r}) g (t_{1}, \dots, t_{q - r}, u_{1}, \dots, u_{r}) d u_{1} \dots d u_{r} .

I_{1} (f) I_{1} (g) = 2^{- 1} I_{2} (f \otimes g + g \otimes f) + ⟨ f, g ⟩_{H} .

I_{1} (f) I_{1} (g) = 2^{- 1} I_{2} (f \otimes g + g \otimes f) + ⟨ f, g ⟩_{H} .

\left(E\big{[}|F|^{p}\big{]}\right)^{1/p}\leqslant c_{p,q}\left(E\big{[}|F|^{2}\big{]}\right)^{1/2}\ \mbox{ for any }p\geqslant 2.

\left(E\big{[}|F|^{p}\big{]}\right)^{1/p}\leqslant c_{p,q}\left(E\big{[}|F|^{2}\big{]}\right)^{1/2}\ \mbox{ for any }p\geqslant 2.

F = δ (- D L^{- 1}) F .

F = δ (- D L^{- 1}) F .

d_{T V} (X, Y) := A \in B (R) sup ∣ P [X \in A] - P [Y \in A] ∣

d_{T V} (X, Y) := A \in B (R) sup ∣ P [X \in A] - P [Y \in A] ∣

d_{W} (X, Y) := f \in L i p (1) sup ∣ E f (X) - E f (Y) ∣,

d_{W} (X, Y) := f \in L i p (1) sup ∣ E f (X) - E f (Y) ∣,

E [X f (X)] = E [f^{'} (X) ⟨ D X, - D L^{- 1} X ⟩_{H}]

E [X f (X)] = E [f^{'} (X) ⟨ D X, - D L^{- 1} X ⟩_{H}]

d_{T V} (X, N) ⩽ 2 E 1 - ⟨ D X, - D L^{- 1} X ⟩_{H} .

d_{T V} (X, N) ⩽ 2 E 1 - ⟨ D X, - D L^{- 1} X ⟩_{H} .

d_{T V} (X, N) ⩽ 2 E 1 - q^{- 1} ∥ D X ∥_{H}^{2} .

d_{T V} (X, N) ⩽ 2 E 1 - q^{- 1} ∥ D X ∥_{H}^{2} .

∥ Z_{n} ∥_{L^{p} (Ω)} ⩽ c_{p} \cdot n^{- γ},

∥ Z_{n} ∥_{L^{p} (Ω)} ⩽ c_{p} \cdot n^{- γ},

∣ Z_{n} ∣ ⩽ η_{ε} \cdot n^{- γ + ε} \mbox a l m os t s u r e l y

∣ Z_{n} ∣ ⩽ η_{ε} \cdot n^{- γ + ε} \mbox a l m os t s u r e l y

\left\{\begin{array}[]{ll}Y_{n}=a_{0}+a_{1}Y_{n-1}+\varepsilon_{n},&n\geqslant 1\\ \varepsilon_{n}=\sum\limits_{\delta=1}^{\infty}\sigma_{\delta}(Z_{n,\delta}^{2}-1)&\\ Y_{0}=y_{0}\in\mathbb{R}.&\end{array}\right.

\left\{\begin{array}[]{ll}Y_{n}=a_{0}+a_{1}Y_{n-1}+\varepsilon_{n},&n\geqslant 1\\ \varepsilon_{n}=\sum\limits_{\delta=1}^{\infty}\sigma_{\delta}(Z_{n,\delta}^{2}-1)&\\ Y_{0}=y_{0}\in\mathbb{R}.&\end{array}\right.

δ = 1 \sum \infty σ_{δ}^{2} < \infty.

δ = 1 \sum \infty σ_{δ}^{2} < \infty.

Y_{i} = d_{i} + k = 1 \sum i a_{1}^{i - k} δ = 1 \sum \infty σ_{δ} (Z_{k, δ}^{2} - 1), i ⩾ 1

Y_{i} = d_{i} + k = 1 \sum i a_{1}^{i - k} δ = 1 \sum \infty σ_{δ} (Z_{k, δ}^{2} - 1), i ⩾ 1

d_{i} = a_{1}^{i} y_{0} + a_{0} k = 1 \sum i a_{1}^{i - k} .

d_{i} = a_{1}^{i} y_{0} + a_{0} k = 1 \sum i a_{1}^{i - k} .

Y_{i}

Y_{i}

= d_{i} + k = 1 \sum i a_{1}^{i - k} δ = 1 \sum \infty σ_{δ} (W^{2} (h_{k, δ}) - 1)

= d_{i} + k = 1 \sum i a_{1}^{i - k} δ = 1 \sum \infty σ_{δ} I_{2}^{W} (h_{k, δ}^{\otimes 2})

Y_{i} = d_{i} + \tilde{Y}_{i},

Y_{i} = d_{i} + \tilde{Y}_{i},

\tilde{Y}_{i} := I_{2}^{W} (f_{i}) and f_{i} := k = 1 \sum i a_{1}^{i - k} δ = 1 \sum \infty σ_{δ} h_{k, δ}^{\otimes 2} .

\tilde{Y}_{i} := I_{2}^{W} (f_{i}) and f_{i} := k = 1 \sum i a_{1}^{i - k} δ = 1 \sum \infty σ_{δ} h_{k, δ}^{\otimes 2} .

∥ f_{i} ∥_{L^{2} ([0, 1]^{2})}^{2} = δ = 1 \sum \infty σ_{δ}^{2} \times \frac{( 1 - a _{1}^{2 i} )}{( 1 - a _{1}^{2} )} ⩽ \frac{1}{( 1 - a _{1}^{2} )} δ = 1 \sum \infty σ_{δ}^{2} < \infty.

∥ f_{i} ∥_{L^{2} ([0, 1]^{2})}^{2} = δ = 1 \sum \infty σ_{δ}^{2} \times \frac{( 1 - a _{1}^{2 i} )}{( 1 - a _{1}^{2} )} ⩽ \frac{1}{( 1 - a _{1}^{2} )} δ = 1 \sum \infty σ_{δ}^{2} < \infty.

Q_{n} := \frac{1}{n} i = 1 \sum n (Y_{i} - d_{i})^{2} = \frac{1}{n} i = 1 \sum n \tilde{Y}_{i}^{2} .

Q_{n} := \frac{1}{n} i = 1 \sum n (Y_{i} - d_{i})^{2} = \frac{1}{n} i = 1 \sum n \tilde{Y}_{i}^{2} .

Q_{n} - E [Q_{n}]

Q_{n} - E [Q_{n}]

= \frac{1}{n} i = 1 \sum n I_{4}^{W} (f_{i} \otimes f_{i}) + \frac{4}{n} i = 1 \sum n I_{2}^{W} (f_{i} \otimes_{1} f_{i})

= I_{4}^{W} (\frac{1}{n} i = 1 \sum n f_{i} \otimes f_{i}) + I_{2}^{W} (\frac{4}{n} i = 1 \sum n f_{i} \otimes_{1} f_{i})

=: T_{4, n} + T_{2, n} .

E [(n T_{2, n})^{2}] - \frac{32 δ = 1 \sum \infty σ _{δ}^{4}}{( 1 - a _{1}^{2} ) ^{2}} ⩽ \frac{C _{1}}{n},

E [(n T_{2, n})^{2}] - \frac{32 δ = 1 \sum \infty σ _{δ}^{4}}{( 1 - a _{1}^{2} ) ^{2}} ⩽ \frac{C _{1}}{n},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

AR(1) processes driven by second-chaos white noise: Berry-Esséen

bounds for quadratic variation and parameter estimation

Soukaina Douissi1, Khalifa Es-Sebaiy2, Fatimah Alshahrani3, Frederi G. Viens4.

1 Laboratory LIBMA, Faculty Semlalia, Cadi Ayyad University, 40000 Marrakech, Morocco.

Email: [email protected]

2 Department of Mathematics, Faculty of Science, Kuwait University, Kuwait.

Email: [email protected]

3 Department of mathematical science, Princess Nourah bint Abdulrahman university, Riyadh.

3,4 Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824.

Email: [email protected], [email protected]

Abstract: In this paper, we study the asymptotic behavior of the quadratic variation for the class of AR(1) processes driven by white noise in the second Wiener chaos. Using tools from the analysis on Wiener space, we give an upper bound for the total-variation speed of convergence to the normal law, which we apply to study the estimation of the model’s mean-reversion. Simulations are performed to illustrate the theoretical results.

Key words: Central limit theorem; Berry-Esséen; Malliavin calculus; parameter estimation; time series; Wiener chaos

2010 Mathematics Subject Classification: 60F05; 60H07; 62F12; 62M10 ††footnotetext: The first author is supported by the Fulbright joint supervision program for PhD students for the academic year 2018-2019 between Cadi Ayyad University and Michigan State University. The fourth author is partially supported by NSF awards DMS 1734183 and 1811779, and ONR award N00014-18-1-2192.

1 Introduction

The topic of statistical inference for stochastic processes has a long history, addressing a number of issues, though many difficult questions remain. At the same time, a number of application fields are anxious to see some practical progress in a selection of directions. Methodologies are sought which are not just statistically sound, but stand a good chance of being computationally implementable, if not nimble, to help practitioners make data-based decisions in stochastic problems with complex time evolutions. In this paper, which is motivated by parameter estimation within the above context, we propose a quantitatively sharp analysis in this direction, and we honor the scientific legacy of Prof. Larry Shepp.

Prof. Shepp is widely known for his seminal work on stochastic control, optimal stopping, and applications in areas such as investment finance. Often labeled as an applied probabilist, by those working in that area, he had the merit, among many other qualities, of showing by example that research activity in this area could benefit from an appreciation for the mathematical aesthetics of constructing stochastic objects for their own sake. His papers also showed that one’s work is only as applied as one’s ability to calibrate a stochastic model to a realistic scenario. As obvious as this view may seem, it is nonetheless in short supply in some current circles, where model sophistication seems to replace all other imperatives. Instead, we believe some of the principles guiding applied probability research should include (i) statistical parsimony and robustness, (ii) feature discovery, and above all, (iii) real-world impact where mathematicians propose a real solution to a real problem. We think that Prof. Shepp would not have been shy about agreeing that his seminal and highly original works on optimal stopping and stochastic control [45], [SControl], including the invention [46] of the widely used Russian financial option, illustrate items (ii) and (iii) in this philosophy perfectly. This leaves the question of how to estimate model parameters needed to implement applied solutions. Prof. Shepp proved on many occasions that this concern was also high on his list of objectives in applied work; he proposed methods aligning with our stated principle (i) above. The best example is the work for which Shepp is most widely known outside of our stochastic circles: the mathematical foundation of the Computational Tomography (CT) scanner, and in particular, the basis [47] for its data analysis. Prof. Shepp is less well known for his direct interest in statistically motivated stochastic modeling; the posthumous paper [17] is an instance of this, on asymptotics of auto-regressive processes with normal noise (innovations).

Our paper honors this legacy by providing a detailed and mathematically rigorous stochastic analysis of some building blocks needed in the data analysis of a simple class of stochastic processes. Our paper’s originality is in working out detailed quantitative properties for auto-regressive processes with innovations in the second Wiener chaos. Our framework is parsimonious in the sense of being determined by a small number of parameters, while covering features of stationarity, mean-reversion, and heavier-than-normal tail weight. We focus on establishing rates of convergence in the central limit theorem for quadratic variations of these processes, which we are then able to transfer to similar rates for the model’s moments-based parameter estimation. This precision would allow practitioners to determine the validity and uncertainty quantification of our estimates in the realistic setting of moderate sample size. Careless use of a method of moments would ignore the potential for abusive conclusions in this heavy-tailed time-series setting.

The remainder of this introduction begins with an overview of the landscape of parameter estimation for stochastic processes related to ours. The few included references call for the reader to find additional references therein, for the sake of conciseness. We then introduce the specific model class used in this paper. It represents a continuation of the current literature’s motivation to calibrate stochastic models with features such as stochastic memory and path roughness. It constitutes a departure from the same literature’s focus on the framework of Gaussian noise.

1.1 Parameter estimation for stochastic processes: historical and

recent context

Some of the early impetus in parameter estimation for stochastic processes was inspired by classical ideas from frequentist statisitics, such as the theoretical and practical superiority of maximum likelihood estimation (MLE), over other, less constrained methodologies, in many contexts. We will not delve into the description of many such instances, citing only the seminal account [26], first published in Russian in 1974 (see references therein in Chapter 17, such as [34]). This was picked up two decades later in the context of processes driven by fractional Brownian motion, where it was shown that the martingale property used in earlier treatments was not a necessary ingredient to establish the properties of such MLEs: see in particular the treatment of processes with fractional noise in [24] and in [51]. It was also noticed that least-squares ideas, which led to MLEs in cases of white-noise driven processes, did not share this property in the case of processes driven by fractional noise: this was pointed out in the continuous-time based paper [22]. See also a more detailed account of this direction of work in [16] and references therein, including a discussion of the distinction between estimators based on continuous paths, and those using discrete sampling under in-fill and increasing-horizon asymptotics. These were applied particularly to various versions of the Ornstein-Uhlenbeck process, as examples of processes with stationary increments and an ability to choose other features such as path regularity and short or long memory.

The impracticality of computing MLEs for parameters of stochastic processes in these feature-rich contexts, led the community to consider other methodologies, looking more closely at least squares and beyond. A popular approach is to work with incarnations of the method of moments. A full study in the case of general stationary Gaussian sequences, with application to in-fill asymtotics for the fractional Ornstein-Uhlenbeck process, is in [4]. This paper relates the relatively long history of those works where estimation of a memory or Hölder-regularity parameter uses moments-based objects, particularly quadratic variations. It also shows that the generalized method of moments can, in principle, provide a number of options to access vectors of parameters for discretely observed Gaussian processes in a practical way. This was also illustrated recently in [18], where the Malliavin calculus and its connection to Stein’s method was used to establish speeds of convergence in the central-limit theorems for quadratic-variations-based estimators for discretely observed processes. The Stein-Malliavin technical methodology employed in [18] is that which was introduced by Nourdin and Peccati in 2009, as described in their 2012 research monograph [32].

Other estimation methods are also proposed for general stationary time series, which we mention here, though they fall out of the scope of our paper, and they do not lead to the same precision as those based on the Stein-Malliavin method: see e.g. [53] and [54] for the Yule-Walker method and extensions. While the paper [52] establishes that essentially every continuous-time stationary process can be represented as the solution of a Langevin (Ornstein-Uhlenbeck-type) equation with an appropriate noise distribution, the two aforementioned follow-up papers [53, 54], which present an analog in discrete time, do not, however, connect the discrete and continuous frameworks via any asymptotic theory.

Following an initial push in [51], most of the recent papers mentioned above, and recent references therein, state an explicit effort to work with discretely observed processes. At least in the increasing-horizon case, the papers [18] and [13] had the merit of pointing out that many of the discretization techniques used to pass from continuous-path to discrete-observation based estimators, were inefficient, and it is preferable to work directly from the statistics of the discretely observed process. Our paper picks up this thread, and introduces a new direction of research which, to our knowledge, has not been approached by any authors: can the asymptotic normality of quadratic variations and related estimators, including very precise results on speeds of convergence, be obtained when the driving noise is not Gaussian?

The main underlying theoretical result we draw on is the optimal estimation of total-variation distances between chaos variables and the normal law, established in [31]. It was used for quadratic variations of stationary sequences in the Gaussian case in [29]. But when the Gaussian setting is abandonned, the result in [31] cannot be used directly. Instead, our paper makes a theoretical advance in the analysis on Wiener space, by drawing on a simple idea in the recent preprints [37] and [33]; our main result provides an example of a sum of chaos variables whose distance to the normal appears to be estimated optimally, whereas a standard use of the Schwartz inequality would result in a much weaker result. The precise location of the technique leading to this improvement is pointed out in the main body of our paper: see Theorem 8, particularly inequality (24) in its proof and the following brief discussion there, and Remark 9. This allows us to prove our Berry-Esséen-type speed of $n^{-1/2}$ , rather than what would have resulted in a speed of $n^{-1/4}.$

1.2 A stationary process with second-chaos noise, and related

literature

Given our intent to address the new issue of noise distribution, and knowing that Berry-Esséen-type questions for models with mere Gaussian noise already present technical challenges, we choose to minimize the number of technical issues to address in this paper by focusing on the simplest possible stationary model class which does not restrict the marginal noise distribution within a family which is tractable using Wiener chaos analysis and tools from the Malliavin calculus. This is the auto-regressive model of order 1 (a.k.a. AR(1)) with independent noise terms, where the noise distribution is in the second Wiener chaos, i.e.

[TABLE]

where $\left\{\varepsilon_{n};n\in\mathbf{Z}\right\}$ is an i.i.d. sequence in the second Wiener chaos, and $a_{0}$ and $a_{1}$ are constants. The complete description and construction of this process and of the noise sequence is given in Section 3, see (8).

As explained in Section 2, the second Wiener chaos is a linear space, and since the model (1) is linear, its solution, if any, lies in the same chaos. This points to a simple theoretical motivation and justification for studying the increasing horizon problem as opposed to the in-fill problem. We also include a practical motivation for doing so, further below in this section, coming from an environmental statistics question.

For the former motivation, note that the AR(1) specification (1), with essentially any square-integrable i.i.d. noise distribution, is known to converge weakly, after appropriate aggregation and scaling, to the so-called Ornstein-Uhlenbeck process (also known occasionally as the Vasicek process), which solves the stochastic differential equation

[TABLE]

where $W$ is a standard Wiener process (Gaussian Brownian motion), and the parameters $\alpha,m$ , and $\sigma$ are explicitly related to $a_{0},a_{1}$ and $Var\left[\varepsilon_{n}\right]$ . See for instance [48, Chapter 2], which covers the case of all square-integrable innovations; this paper assumes a piecewise linear interpolation in the normalization, which could be eliminated by switching to convergence in the Skorohod $J_{1}$ topology. A reference avoiding linear interpolation, with convergence in the Skorohod $J_{1}$ topology, is [10], where innovations are assumed to have four moments. In any case, this central limit theorem constrains the modeling of stationary/ergodic processes via diffusive differential formulation: under in-fill asymptotics with weakly dependent noise, the AR(1) specification cannot preserve any non-normal noise distribution in the limit. It is of course possible to interpolate the above process $Y$ in a number of ways, to result in a continuous-time process whose discrete-time marginals are those specified via (1).

However we believe it is difficult or impossible to give a linear diffusion-type stochastic differential equation, akin to (2), whose fixed-time-step marginals are as in (1) for an arbitrary noise distribution, while simultaneously describing what second-chaos process differential would need to replace $W$ in (2). The so-called Rosenblatt process (see [50]), the only known second-chaos continuous-time process with a stochastic calculus similar to $W$ ’s, gives an example of a viable alternative to (2) living in the second chaos. But this process is known to have only a long-memory version. Thus it cannot be a proxy for any continuous-time analogue of (1), since the noise there has no memory. Similar issues would presumably exist for other non-Gaussian AR(1) and related auto-regressive processes. A few have been studied recently in the literature. We mention [21, 55], which cover various noise structures similar to second-chaos noises, and here again, no asymptotic or interpolation theory is provided to relate to continuous time. There does exist a general treatment in [17] of asymptotics for all AR( $p$ ) processes: the limit processes are the so-called Continuous-AR( $p$ ) processes, which are Gaussian, and have $p-1$ -differentiable paths (a form of very long memory for $p>1$ ); that paper assumes normal innovations to keep technicalities to a minimum.

Another indication that finding such a proxy may fail comes in the specific case of the so-called Gumbel distribution for $\varepsilon_{n}$ . This law is a popular distribution for extreme-value modeling. The fact that this law is in the second chaos is a classical result (as a weighted sum of exponentials, see [44]), though it does not appear to be widely know in the extreme-value community. The standard (mean-zero) Gumbel law can be represented as $\sum_{n=1}^{\infty}j^{-1}\left(E_{j}-1\right)$ where $E_{j}=\left(N_{j}^{2}+\bar{N}_{j}^{2}\right)/2$ is a standard exponential variable (chi-squared with two degrees of freedom, $N_{j}$ and $\bar{N}_{j}$ are iid standard normals). The Gumbel law is known to give rise to a second-chaos version of an isonormal Gaussian process, known as the Gumbel noise measure (or Gumbel process); that stochastic measure obeys the same laws as the white-noise measure (including independence of increments which fails for the Rosenblatt noise), if one replaces the standard algebra of the reals by the max-plus algebra. This is explained in detail in the preprint [27]; also see references therein. By virtue of this change of algebra, stochastic differential specifications as in (2) cannot be defined using the Gumbel noise.

However, the discrete version of the Gumbel noise, an i.i.d. sequence $\left(\varepsilon_{n}\right)_{n}$ with Gumbel marginals, is a good example of a noise type which can be used in the AR(1) process (1). This specific model, known as the AR(1) process with Gumbel noise (or innovations), presents a main motivation for our work. Recent references on this process, and on the closely related process where the marginals of $Y$ are Gumbel-distributed, include [28] for a Bayesian study, [49] for applications to maxima in greenhouse gas concentration data, and [3] for AR processs in the broader extreme-value context. A survey on AR(1) models with different types of innovations and marginals, while not including the Gumbel, is in the unpublished manuscript [20]. The use of the Gumbel distribution for describing environmental time series, mainly when looking at extremes, is fairly widespread, but we do not cite this literature because it does not appear willing to acknowledge that time-series models driven by Gumbel innovations should be used, rather than using tools for i.i.d. Gumbel data. This literature, which is easy to find, is also entirely unaware that the Gumbel distribution is in the second Wiener chaos.

All these reasons give us ample cause to investigate the basic method-of-moments building blocks for determining parameters in stationary time series with second-chaos innovations. For the sake of concentrating on the core mathematical analysis towards this end, we focus on the asymptotics of quadratic variations for models in the class (1). The methodology developed in [18] can then be adapted to handle any method-of-moments-based estimators, at the cost of some additional effort. We provide examples of this in the latter sections of this paper. Our main result is that, for any second-chaos innovations in (1), the quadratic variation of $Y$ has explicit normal asymptotics, with a speed of convergence in total variation which matches the classical Berry-Esseén speed of $n^{-1/2}$ .

The remainder of this paper is structured as follows. Section 2 provides elements from the analysis on Wiener space which will be used in the paper. Section 3 presents the details of the class of AR(1) models we will analyze. Section 4 computes the asymptotic variance of the AR(1)’s quadratic variation by looking separately at its 2nd-chaos and 4th-chaos components, whose asymptotics are of the same order. Section 5 establishes our main result, the Berry-Esseén speed of convergence in total-variation for the normal fluctuations of the AR(1)’s quadratic variation. Finally, Section 6 defines a method-of-moments estimator for the mean-reversion rate of this AR(1) process, and establishes its asymptotic properties; a numerical study is included to gauge the distance between this renormalized estimator and the normal law.

2 Preliminaries

In this first section, we recall some elements from stochastic analysis that we will need in the paper. See [32], [39], and [40] for details. Any real, separable Hilbert space ${\mathcal{H}}$ gives rise to an isonormal Gaussian process: a centered Gaussian family $(G(\varphi),\varphi\in{\mathcal{H}})$ of random variables on a probability space $(\Omega,\mathcal{F},\mathbf{P})$ such that $\mathbf{E}(G(\varphi)G(\psi))=\langle\varphi,\psi\rangle_{{\mathcal{H}}}$ . In this paper, it is enough to use the classical Wiener space, where $\mathcal{H}=L^{2}([0,1])$ , though any $\mathcal{H}$ will also work. In the case ${\mathcal{H}}=L^{2}([0,1])$ , $G$ can be identified with the stochastic differential of a Wiener process $W$ and one interprets $G(\varphi):=\int_{0}^{1}\varphi\left(s\right)dW\left(s\right)$ .

The Wiener chaos of order $n$ is defined as the closure in $L^{2}\left(\Omega\right)$ of the linear span of the random variables $H_{n}(G(\varphi))$ , where $\varphi\in{\mathcal{H}},\|\varphi\|_{{\mathcal{H}}}=1$ and $H_{n}$ is the Hermite polynomial of degree $n$ . The intuitive Riemann-sum-based notion of multiple Wiener stochastic integral $I_{n}$ with respect to $G\equiv W$ , in the sense of limits in $L^{2}\left(\Omega\right)$ , turns out to be an isometry between the Hilbert space ${\mathcal{H}}^{\odot n}$ (symmetric tensor product) equipped with the scaled norm $\frac{1}{\sqrt{n!}}\|\cdot\|_{{\mathcal{H}}^{\otimes n}}$ and the Wiener chaos of order $n$ under $L^{2}\left(\Omega\right)$ ’s norm. In any case, we have the following fundamental decomposition of $L^{2}\left(\Omega\right)$ as a direct sum of all Wiener chaos.

$\bullet$ The Wiener chaos expansion. For any $F\in L^{2}\left(\Omega\right)$ , there exists a unique sequence of functions $f_{n}\in{\mathcal{H}}^{\odot n}$ such that

[TABLE]

where the terms are all mutually orthogonal in $L^{2}\left(\Omega\right)$ and

[TABLE]

$\bullet$ Product formula and contractions. Since $L^{2}\left(\Omega\right)$ is closed under multiplication, the special case of the above expansion exists for calculating products of Wiener integrals, and is explicit using contractions: for any $p$ , $q$ , and symmetric integrands $f\in\mathcal{H}^{\odot p}$ and $g\in\mathcal{H}^{\odot q}$ ,

[TABLE]

see [39, Proposition 1.1.3] for instance; the contraction $f\otimes_{r}g$ is the element of ${\mathcal{H}}^{\otimes(p+q-2r)}$ defined by

[TABLE]

The special case for $p=q=1$ is particularly handy, and can be written in its symmetrized form:

[TABLE]

$\bullet$ Hypercontractivity in Wiener chaos. For $h\in{\mathcal{H}}^{\otimes q}$ , the multiple Wiener integrals $I_{q}(h)$ , which exhaust the set ${\mathcal{H}}_{q}$ , satisfy a hypercontractivity property (equivalence in ${\mathcal{H}}_{q}$ of all $L^{p}$ norms for all $p\geqslant 2$ ), which implies that for any $F\in\oplus_{l=1}^{q}{\mathcal{H}}_{l}$ (i.e. in a fixed sum of Wiener chaoses), we have

[TABLE]

It should be noted that the constants $c_{p,q}$ above are known with some precision when $F\in{\mathcal{H}}_{q}$ : by Corollary 2.8.14 in [32], $c_{p,q}=\left(p-1\right)^{q/2}$ .

$\bullet$ Malliavin derivative and other operators on Wiener space. The Malliavin derivative operator $D$ , and other operators on Wiener space, are needed briefly in this paper, to provide an efficient proof of the first theorem in Section 5, and to interpret an observation of I. Nourdin and G. Peccati, given below in (7), for a bound on the total variation distance of any chaos law to the normal law. We do not provide any background on these operators, referring instead to Chapter 2 in [32], and briefly mentioning here the facts we will use in the proof of Section 5, without spelling out all assumptions. Strictly speaking, all the results in this paper can be obtained without the following facts, but this would be exceedingly tedious and wholly nontransparent.

•

The operator $D$ maps $I_{q}\left(f\right)$ to $t\mapsto qI_{q-1}\left(f\left(.,t\right)\right)$ and is consistent with the ordinary chain rule. Its domain is denoted by $\mathbf{D}^{1,2},$ and includes all chaos variables.

•

The operator $L$ , known as the generator of the Orstein-Uhlenbeck semigroup on Wiener space, maps $I_{q}\left(f\right)$ to $-qI_{q}\left(f\right)$ , and $L^{-1}$ denotes its pseudo-inverse: $L$ ’s kernel is the constants, all other chaos are its eigenspaces. Combining this with the previous point, we obtain $-D_{t}L^{-1}I_{q}\left(f\right)=I_{q-1}\left(f\left(.,t\right)\right)$ .

•

This $D$ has an adjoint $\delta$ in $L^{2}\left(\Omega\right)$ , which by definition satisfies the duality relation $\mathbf{E}\left\langle DF,u\right\rangle_{\mathcal{H}}=\mathbf{E}\left[F\delta\left(u\right)\right]$ , where $u$ is any stochastic process for which the expressions are defined. The domain of $\delta$ is a non-trivial object of study, but it is known to contain all square-integrable $W$ -adapted processes for the case of $G=W$ , the wiener process, where $\mathcal{H}=L^{2}\left([0,1]\right)$ .

•

We have the relation

[TABLE]

$\bullet$ Distances between random variables. The following is classical. If $X,Y$ are two real-valued random variables, then the total variation distance between the law of $X$ and the law of $Y$ is given by

[TABLE]

where the supremum is over all Borel sets. The Kolmogorov distance $d_{Kol}\left(X,Y\right)$ is the same as $d_{TV}$ except one take the sup over $A$ of the form $(-\infty,z]$ for all real $z$ . The Wasserstein distance uses Lipschitz rather than indicator functions:

[TABLE]

$Lip(1)$ being the set of all Lipschitz functions with Lipschitz constant $\leqslant 1$ .

$\bullet$ The observation of Nourdin and Peccati. Let $N$ denote the standard normal law. The following observation relates an integration-by-parts formula on Wiener space with a classical result of Ch. Stein.

Let $X\in\mathbf{D}^{1,2}$ with $\mathbf{E}\left[X\right]=0$ and $Var\left[X\right]=1$ . Then (see [31, Proposition 2.4], or [32, Theorem 5.1.3]), for $f\in C_{b}^{1}\left(\mathbf{R}\right)$ ,

[TABLE]

and by combining this with properties of solutions of Stein’s equations, one gets

[TABLE]

One notes in particular that when $X\in{\mathcal{H}}_{q}$ , since $-L^{-1}X=q^{-1}X$ , we obtain conveniently

[TABLE]

$\bullet$ A convenient lemma. The following result is a direct consequence of the Borel-Cantelli Lemma (the proof is elementary; see e.g. [25]). It is convenient for establishing almost-sure convergences from $L^{p}$ convergences.

Lemma 1

Let $\gamma>0$ . Let $(Z_{n})_{n\in\mathbb{N}}$ be a sequence of random variables. If for every $p\geqslant 1$ there exists a constant $c_{p}>0$ such that for all $n\in\mathbb{N}$ ,

[TABLE]

then for all $\varepsilon>0$ there exists a random variable $\eta_{\varepsilon}$ which is almost such that

[TABLE]

for all $n\in\mathbb{N}$ . Moreover, $E|\eta_{\varepsilon}|^{p}<\infty$ for all $p\geqslant 1$ .

3 The model

3.1 Definition

We consider the following AR(1) model

[TABLE]

where $a_{0}$ , $a_{1}$ and $\left\{\sigma_{\delta},\delta\geqslant 1\right\}$ are real constants. The sequence of innovations $\left\{\varepsilon_{n},n\geqslant 1\right\}$ is i.i.d., with distribution in the second Wiener chaos. It turns out that this sequence can be represented as in the second line above in (8), where the family $\left\{Z_{n,\delta},n\geqslant 1,\delta\geqslant 1\right\}$ are i.i.d. standard Gaussian random variables defined on $(\Omega,\mathcal{F},\mathbf{P})$ , and $\left\{\sigma_{\delta};\delta\geqslant 1\right\}$ is a sequence of reals satisfying

[TABLE]

This is explained in [32, Section 2.7.4]. We assume that the mean reversion parameter $a_{1}$ is such that $\left|a_{1}\right|<1$ . Under this condition, (8) also admits a stationary ergodic solution. Both the version above and the stationary version are linear functionals of elements of the form of $\varepsilon_{n}$ , which are elements of the second Wiener chaos. Since this chaos is a vector space, both versions of $Y$ take values in the second Wiener chaos.

By truncating the series in (8), one obtains a process which is a sum of chi-squared variables, converging to $Y$ in $L^{2}(\Omega)$ . Special cases where the sum is finite, can be considered. In the figures below, we simulate 500 observations from such cases, to show the variety of behaviors, even with a limited number of terms in the noise series.

•

When $\sigma_{1}=\sigma$ and $\sigma_{\delta}=0$ , for all $\delta\geqslant 2$ , corresponds to a scaled mean-zero chi-squared white noise with one degree of freedom: $(Z_{n,1}^{2}-1)\sim\chi^{2}(1)$ .

•

When $\sigma_{1}=\sigma_{2}=\sigma$ and $\sigma_{\delta}=0$ , $\forall\delta\geqslant 3$ , an exponential white noise with rate parameter $1/(2\sigma)$ . Indeed $(Z_{n,1}^{2}-1)+(Z_{n,2}^{2}-1)\sim\mathcal{E}(1/2)$ .

•

When $\sigma_{1}=-\sigma_{2}$ , and $\sigma_{\delta}=0$ , for all $\delta\geqslant 2$ , which is a symmetric second chaos white noise, $\varepsilon$ ’s law is equal to a product normal law: if $N,N^{\prime}$ are two i.i.d. standard normals, then $2NN^{\prime}\sim(Z_{n,1}^{2}-1)-(Z_{n,2}^{2}-1)=Z_{n,1}^{2}-Z_{n,2}^{2}$ .

Remark 2

We can see from the figures above the asymmetry in figures (a), (b) and (c) due to the asymmetric nature of the noise; figure (d) shows more symmetry because of the choice $\sigma_{1}=-\sigma_{2}$ . We also notice that when the mean reversion is fairly strong and the noise is large the shape of the observations is balanced (figure (a)), while when the noise is larger compared to the mean-reversion parameter, the observations look like an Ornstein-Uhlenbeck process with a noise larger than the drift (see figure (b)).

3.2 Quadratic variation

This paper’s main goal is to determine the asymptotic distribution of the quadratic variation of the observations $\left\{Y_{n},n\geqslant 1\right\}$ using analysis on Wiener space.

This will be facilitated by the fact, mentioned above, that the sequence $(Y_{n})_{n\geqslant 1}$ lives in the second Wiener chaos with respect to the Wiener process $W$ , by virtue of being the solution of a linear equation with noise in the second chaos. To be more specific, observe that $\left\{Y_{n},n\geqslant 1\right\}$ in (8) can be expressed recursively as follows :

[TABLE]

where

[TABLE]

For the sake of ease of computation in Wiener chaos, it will be convenient throughout this paper to refer to the Wiener integral representation of the noise terms $Z_{k,\delta}^{2}$ . For this, there exists $\left\{h_{k,\delta},k\geqslant 1,\delta\geqslant 1\right\}$ an orthonormal family $L^{2}([0,1])$ for which $Z_{k,\delta}=W(h_{k,\delta})=I_{1}^{W}\left(h_{k,\delta}\right)=\int_{0}^{1}h_{k,\delta}\left(r\right)dW\left(r\right)$ . Hence, using the fact, which comes from the most elementary application of the product formula (4), that $W^{2}\left(\varphi\right)-1=I_{2}^{W}\left(\varphi^{\otimes 2}\right)$ , we have for $i\geqslant 1$ :

[TABLE]

Therefore, using the linearity property of multiple integrals, we can write for $i\geqslant 1$ ,

[TABLE]

where

[TABLE]

A straightforward computation shows that under Assumption (9), the kernel $f_{i}\in L^{2}([0,1]^{2})$ for all $i\geqslant 1$ : indeed

[TABLE]

Our main object of study in the next two sections is the asymptotics of the quadratic variation defined as follows :

[TABLE]

Using product formula (4), we get

[TABLE]

In the next section, we show that the asymptotic variance of $\sqrt{n}\left(Q_{n}-\mathbf{E}[Q_{n}]\right)$ exits and we will compute its speed of convergence. Then we establish a CLT for $Q_{n}$ , and compute its Berry-Esséen speed of convergence in total variation.

4 Asymptotic variance of the quadratic variation

Using the orthogonality of multiple integrals living in different chaos, to calculate the limiting variance of $\sqrt{n}(Q_{n}-\mathbf{E}[Q_{n}])$ , we need only study separately the second moments of the terms $T_{2,n}$ and $T_{4,n}$ given in (14).

4.1 Scale constant for $T_{2,n}$

Proposition 3

Under Assumption (9), with $T_{2,n}$ as in (14), for large $n$ ,

[TABLE]

where $C_{1}:=32\left(\sum\limits_{\delta=1}^{\infty}\sigma_{\delta}^{4}\right)\frac{\left[1+a_{1}^{2}(5+6a_{1}^{2})\right]}{(1-a_{1}^{4})^{2}(1-a_{1}^{2})}$ . In particular

[TABLE]

Proof.

We have $T_{2,n}=I^{W}_{2}(\frac{4}{n}\sum\limits_{i=1}^{n}f_{i}\otimes_{1}f_{i})$ , by the isometry property (3) of multiple integrals, we get

[TABLE]

Moreover, under Assumption (9), we have

[TABLE]

Therefore, for $i,j\geqslant 1$ such that $j\geqslant i$ , we get

[TABLE]

Therefore, by (17), we have

[TABLE]

Moreover,

[TABLE]

On the other hand

[TABLE]

Consequently

[TABLE]

The desired is therefore obtained. ∎

4.2 Scale constant for $T_{4,n}$

Proposition 4

Under Assumption (9), with $T_{4,n}$ as in (14), for large $n$ ,

[TABLE]

where

[TABLE]

In particular

[TABLE]

Proof.

By definition of the term $T_{4,n}$ , we have

[TABLE]

where $f_{i}\tilde{\otimes}f_{i}$ denotes the symmetrization of $f_{i}\otimes f_{i}$ , because the kernel $\sum_{i=1}^{n}f_{i}{\otimes}f_{i}\in L^{2}([0,1]^{4})$ is no longer symmetric. We deal with symmetrization by using a combinatorial formula, obtaining

[TABLE]

Therefore

[TABLE]

Moreover,

[TABLE]

On the other hand, using (9), we have for $j\geqslant i$

[TABLE]

Therefore by (20), we get

[TABLE]

Now let us estimate

[TABLE]

For $j\geqslant i$ , $x,y\in L^{2}([0,1])$ , we have

[TABLE]

So, for $j\geqslant i$ ,

[TABLE]

Therefore,

[TABLE]

which completes the proof. ∎

To get a sense of how the two terms $T_{2,n}$ and $T_{4,n}$ compare to each other, we propose the following example, which shows that, despite one’s best efforts, one should not expect either of these two terms to dominate the other.

Remark 5

In the AR(1) model (8) with chi-squared white noise, i.e. when $\sigma_{1}=\sigma$ and $\sigma_{\delta}=0$ for all $\delta\geqslant 2$ , one can try to compare the two formulas for the asymptotic variances of $T_{2,n}$ and $T_{4,n}$ . Avoiding the situation where $\left|a_{1}\right|$ is very close to $1$ , assuming for instance $|a_{1}|<2^{-1/2}$ , so that $1-a_{1}^{2}>1/2$ , when $n$ is large, we have

[TABLE]

Therefore the sequence $T_{4,n}$ can be made to have a variance which is significantly smaller that the one of $T_{2,n}$ in this case, but both of them converge to zero at the same speed $n^{-1}$ .

Using the orthogonality between $T_{2,n}$ and $T_{4,n}$ , Proposition 3 and Proposition 4, we conclude the following.

Theorem 6

Under Assumption (9), with $Q_{n}$ as in (13), for large $n$ ,

[TABLE]

and in particular the asymptotic variance of $Q_{n}$ is

[TABLE]

where $C_{1}$ , $l_{1}$ , and $C_{2}$ , $l_{2}$ are given respectively in (15), (16), (18) and (19).

Remark 7

•

From the previous theorem, we notice that for $n$ large, and fixed values of the noise scale parameter family $\left\{\sigma_{\delta},\delta\geqslant 1\right\}$ , the variance of $\sqrt{n}Q_{n}$ has high values when $|a_{1}|$ is close to 1, and approaches

[TABLE]

when $\left|a_{1}\right|$ is small.

•

The previous theorem also shows that one can obtain other asymptotics depending on the relation between $a_{1}$ and the family $\left\{\sigma_{\delta},\delta\geqslant 1\right\}$ . For instance, when $|a_{1}|$ is close to 1, which is the limit of fast mean reversion, one can avoid an explosion of $Q_{n}$ ’s asymptotic variance by scaling the variance parameters appropriately, leading to a fast-mean reversion and small noise regime. Letting $1/\alpha:=1-a_{1}^{2}$ , where $\alpha$ is interpreted as a rate of mean reversion, one would only need to ensure that $\sum\sigma_{\delta}^{4}=O\left(\alpha^{-2}\right)$ and $\sum\sigma_{\delta}^{2}=O\left(\alpha^{-3/2}\right)$ . In the example where there is a single non-zero value $\sigma$ , for instance, we would obtain for large $\alpha$ ,

[TABLE]

here the second term dominates, and as $\alpha\rightarrow\infty$ , assuming $\alpha^{3}\sigma^{4}$ remains bounded, we would get an asymptotic variance of $8\lim_{\alpha\rightarrow\infty}\alpha^{3}\sigma^{4}$ if the limit exists.

5 Berry-Esséen bound for the asymptotic normality of the

quadratic-variation

In this section, we prove that the quadratic variation defined in (13) is asymptotically normal and we estimate the speed of this convergence in total variation distance, showing it is of the Berry-Esséen-type order $n^{-1/2}$ . For this aim, we will need the following theorem, which estimates the total variation distance to the normal of the standardized sum of variables in the 2nd and 4th chaos.

Theorem 8

Let $F=I_{2}(f)+I_{4}(g)$ where $f\in L_{s}^{2}([0,1]^{2})$ and $g\in L_{s}^{2}([0,1]^{4})$ . Then

[TABLE]

Moreover, letting $R_{F}$ be the bracketed term on the right-hand side of (22), for any constant $\sigma>0$ , we have

[TABLE]

Proof.

We have $F=I_{2}(f)+I_{4}(g)$ . Then

[TABLE]

Thus, using $F=\delta(-DL^{-1})F$ , we can write $F=\delta(u)$ . Now we use the result of a simple calculation, labeled as (9) in the preprint [33] (see also [37]), to obtain

[TABLE]

where the last equality comes from the duality relation $E\langle DF,u\rangle_{L^{2}([0,1])}=E(F\delta(u))=EF^{2}$ . The prior inequality appears to be used in a more general context here than what is stated in [33, Eq. (9)], but an immediate inspection of its proof therein shows that it applies to any situation where $F=\delta(u)$ , using only general results such as Stein’s lemma, the chain rule for the Malliavin derivative $D$ , and the duality between $D$ and $\delta$ .

On the other hand, using the product formula (4),

[TABLE]

Thus

[TABLE]

where

[TABLE]

Therefore, using Minkowski inequality,

[TABLE]

Furthermore,

[TABLE]

Also,

[TABLE]

and

[TABLE]

As a consequence,

[TABLE]

This, combined with (24), establishes inequality (22).

For (23), we have by (6)

[TABLE]

Inequality (23) follows using inequalities (25) and (22). ∎

We will now use Theorem 8 to prove that the quadratic variation $Q_{n}$ satisfies the following Berry-Esséen theorem.

Remark 9

It turns out that, when applying Theorem 8 to estimate the speed of convergence in the CLT for $Q_{n}$ , the term $\sqrt{\left\langle f\otimes f,g\otimes_{2}g\right\rangle_{L^{2}([0,1]^{4})}}$ cannot merely be bounded above via Schwarz’s inequality. See Lemma 13 below and its proof. This is the key element which allows us to obtain the Berry-Esséen speed $n^{-1/2}$ in the next theorem.

Theorem 10

With $Q_{n}$ defined in (13), under Assumption (9), we have for all $n\geqslant 1$

[TABLE]

where

[TABLE]

where $l_{1}$ , $l_{2}$ , are defined in the previous section in (16), (19), and $C_{1}$ , $C_{2}$ , $C_{3}$ , $C_{4,r}$ , $r=1,2,3$ and $C_{5}$ are given in the lemmas below, respectively in (15), (18), (29), (30) and (34).

In particular $\sqrt{n}(Q_{n}-\mathbf{E}[Q_{n}])$ is asymptotically Gaussian, namely

[TABLE]

Proof.

Based on the decomposition of $\left(Q_{n}-\mathbf{E}(Q_{n})\right)$ given in (14), we have

[TABLE]

where

[TABLE]

Applying Theorem 8 to $\sqrt{n}\left(Q_{n}-\mathbf{E}[Q_{n}]\right)$ , we get

[TABLE]

We study first the contractions of the kernels $g_{2,n}$ and $g_{4,n}$ given in (27) and we prove that they satisfy the following lemmas.

Lemma 11

If Assumption (9) holds, the kernel $g_{2,n}$ defined in (27) satisfies

[TABLE]

where $C_{3}:=\sqrt{4!\frac{4^{4}}{n}\left(\sum_{\delta=1}^{\infty}\sigma_{\delta}^{8}\right)\frac{1}{1-a_{1}^{8}}\left(\frac{1}{1-a_{1}^{2}}\right)^{3}}$ .

Proof.

We have

[TABLE]

Moreover, by above calculations and (9)

[TABLE]

Hence

[TABLE]

Similarly

[TABLE]

Therefore

[TABLE]

Consequently

[TABLE]

where we used the change of variables $k_{1}=j-i,\,k_{2}=k-i,\,k_{3}=l-i$ . The desired result therefore follows. ∎

Lemma 12

If Assumption (9) holds, for every $r=1,2,3,$ the kernel $g_{4,n}$ defined in (27) satisfies

[TABLE]

where

[TABLE]

Proof.

For $r=1,2,3$ , we have

[TABLE]

For $r=1$ , we get

[TABLE]

By (9), for all $1\leqslant i,k\leqslant n$ ,

[TABLE]

Similarly, for all $1\leqslant j,l\leqslant n$ ,

[TABLE]

On the other hand, for all $1\leqslant i,j\leqslant n$

[TABLE]

Hence, by (9), for all $1\leqslant i,j,k,l\leqslant n$ ,

[TABLE]

Therefore, from (31) and above calculations, we have

[TABLE]

where we used the change of variables $k_{1}=j-i$ , $k_{2}=k-j$ , $k_{3}=l-k$ .

For $r=2$ , we have

[TABLE]

Hence, by (32) and (9), we get

[TABLE]

where we used the change of variables $k_{1}=j-i$ , $k_{2}=k-j$ , $k_{3}=l-k$ .

For $r=3$ , we have

[TABLE]

where, we used (32) and the change of variables $k_{1}=j-i$ , $k_{2}=k-j$ , $k_{3}=l-k$ , which ends the proof. ∎

Lemma 13

Suppose Assumption (9) holds. Consider the kernels $g_{2,n}$ and $g_{4,n}$ defined in (27), then we have

[TABLE]

where $C_{5}:=\sqrt{4!\frac{4}{n}\left(\sum_{\delta=1}^{\infty}\sigma_{\delta}^{8}\right)\frac{1}{1-a_{1}^{8}}\left(\frac{1}{1-a_{1}^{2}}\right)^{3}}$ .

Proof.

We have

[TABLE]

Consequently, using (33), we get

[TABLE]

where we used the change of variables $k_{1}=j-i,\,k_{2}=k-i,\,k_{3}=l-i$ . ∎

The bound (26) is then a direct consequence of inequality (28) and the estimates given respectively in (29), (30), (34) and Theorem 6. ∎

6 Application: estimation of the mean-reversion parameter

In this section, to illustrate the implications of Theorem 10 in parameter estimation in an easily tractable case, we consider that we have observations $Y_{n}$ coming from a specific version of our second-chaos AR(1) model (8), that which is driven by a chi-squared white noise with one degree of freedom:

[TABLE]

where $a_{0}$ , $a_{1}$ and $\sigma$ are real constants, and $\{Z_{n},n\geqslant 1\}$ are i.i.d. standard normal. This is model (8) where all $\sigma_{\delta}$ ’s are zero except for the first one.

Proposition 14

The quadratic variation $Q_{n}$ defined in (13) for model (35) satisfies, for all $n\geqslant 1$ ,

[TABLE]

where $C_{6}:=\frac{2\sigma^{2}}{(1-a_{1}^{2})^{2}}$ .

Proof.

From the definition of $Q_{n}$ in (13), we have by the isometry property of multiple integrals

[TABLE]

∎

Remark 15

Assuming that $\sigma$ is known, Proposition (14) shows that the quadratic variation $Q_{n}$ is an asymptotically unbiased estimator for $2\sigma^{2}/(1-a_{1}^{2})$ , and thus, after a transformation, for $\left|a_{1}\right|$ as well:

[TABLE]

Therefore, using the fact that $\mathbf{E}[Q_{n}]$ can be estimated via $Q_{n}$ , we suggest the following moment estimator for the mean-reversion rate $|a_{1}|$

[TABLE]

where

[TABLE]

6.1 Properties of the estimator $\hat{a}_{n}$

Proposition 16

The estimator $\hat{a}_{n}$ of the mean reversion parameter $|a_{1}|$ defined in (36) is strongly consistent, namely almost surely

[TABLE]

Proof.

We write $Q_{n}=\frac{V_{n}}{\sqrt{n}}+\mathbf{E}[Q_{n}]$ , with $V_{n}=\sqrt{n}(Q_{n}-\mathbf{E}[Q_{n}])$ . According to Proposition (6), we have $\mathbf{E}[V_{n}^{2}]\rightarrow l_{1}+l_{2}$ , $n\rightarrow\infty$ . Hence, there exists a constant $C>0$ , such that $n\geqslant 1$

[TABLE]

Hence, by Lemma 1 we have almost surely $\frac{V_{n}}{\sqrt{n}}\rightarrow 0$ , as $n\rightarrow\infty$ . On the other hand, by Proposition 14, $\mathbf{E}[Q_{n}]\rightarrow\frac{2\sigma^{2}}{1-a_{1}^{2}}$ , as $n\rightarrow\infty$ . Thus $Q_{n}\rightarrow\frac{2\sigma^{2}}{1-a_{1}^{2}}$ almost surely as $n\rightarrow\infty$ , as announced. ∎

Proposition 17

Under Assumption (9), the estimator $\hat{a}_{n}$ defined in (36) satsifies

[TABLE]

where $l_{1}$ and $l_{2}$ are given in Propositions 3 and 4 respectively and $\mu=\frac{2\sigma^{2}}{(1-a_{1}^{2})}$ .

In particular $\hat{a}_{n}$ is asymptotically Gaussian; more precisely we have as $n\rightarrow+\infty$

[TABLE]

where

[TABLE]

Proof.

For $x>2\sigma^{2}$ , the function $f$ defined in (37) has an inverse $f^{-1}(y)=\frac{2\sigma^{2}}{1-y^{2}}$ . Let denote $\mu=f^{-1}(|a_{1}|)=\frac{2\sigma^{2}}{(1-a_{1}^{2})}>2\sigma^{2}$ , for all $a_{1}\in(-1,1)$ . On the other hand, we can write

[TABLE]

Therefore, from the properties of the Wasserstein metric and denoting $N\sim\mathcal{N}(0,1)$ , we get

[TABLE]

where we used the bound (26) and Proposition 14 for the above bounds. On the other hand, since the function $f$ defined in (37) is a diffeomorphism and since $\hat{a}_{n}=f(Q_{n})$ , then by the mean-value theorem, there exists a random variable $\xi_{n}$ $\in$ $[|Q_{n},\mu|]$ such that

[TABLE]

We have

[TABLE]

According to (38), the last term is bounded by the speed $n^{-1/2}$ . Moreover, applying the mean-value theorem again since $f$ is twice continuously differentiable, there exists a random variable $\delta_{n}$ $\in$ $[|\xi_{n},\mu|]$ $\subset$ $[|Q_{n},\mu|]$ , such that

[TABLE]

where we used Hölder’s inequality with $p$ , $p^{\prime}$ are two reals greater than 1 such that $\frac{1}{p}+\frac{1}{p^{\prime}}=1$ . Moreover, by the hypercontractivity property for multiple integrals (5), there exists a constant $C(p)$ such that

[TABLE]

where we used the inequality $(a+b)^{2}\leqslant 2a^{2}+2b^{2}$ , for all $a$ , $b$ $\in\mathbb{R}$ and the bounds of Proposition 14 and Theorem 6 respectively. On the other hand, for all $x>2\sigma^{2}$ ,

[TABLE]

Therefore using (39) and (40), to obtain a bound for the term $d_{W}\left(\frac{\sqrt{n}}{f^{\prime}(\mu)\sqrt{l_{1}+l_{2}}}\left(\hat{a}_{n}-|a_{1}|\right),\frac{\sqrt{n}}{\sqrt{l_{1}+l_{2}}}\left(Q_{n}-\mu\right)\right)$ , it remains to show that $\mathbf{E}[|f^{\prime\prime}(\delta_{n})|^{p^{\prime}}]$ is finite for some $p^{\prime}>1$ but using the fact that $\delta_{n}\in[|Q_{n},\mu|]$ and the monotonicity of $f^{\prime\prime}$ , it is actually sufficient to show that for some $p^{\prime}>1$ , we have

[TABLE]

The function $|f^{\prime\prime}|$ has two singularities in [math] and in $2\sigma^{2}$ and thus is not bounded. But, we can write

[TABLE]

For the term $\mathbf{E}\left[|f^{\prime\prime}(Q_{n})|\mathbf{1}_{\{|Q_{n}-\mu|<\frac{1}{\sqrt{n}}\}}\right]$ and since $\mu>2\sigma^{2}$ , we can pick $n$ such that $\frac{1}{\sqrt{n}}<\sigma^{2}$ . Then $Q_{n}>2\sigma^{2}-\sigma^{2}>0$ , therefore $Q_{n}$ is bounded away from [math] and the term $|Q_{n}|^{-5p^{\prime}/2}$ has no singularity for any $p^{\prime}>1$ . For the term $|Q_{n}-2\sigma^{2}|^{-3p^{\prime}/2}$ , we put $C:=\frac{\mu-2\sigma^{2}}{2}$ , the constant $C\neq 0$ , because $\mu\neq 2\sigma^{2}\Leftrightarrow a_{1}\neq 0$ , we can assume $a_{1}\neq 0$ , because there is no AR(1) process with $a_{1}=0$ . Therefore, we can pick $n$ such that $\frac{1}{\sqrt{n}}<C$ . In this case $Q_{n}-2\sigma^{2}=Q_{n}-\mu+2C>-\frac{1}{\sqrt{n}}+2C>2C-C>0$ , hence the term $|Q_{n}-2\sigma^{2}|^{-3p^{\prime}/2}$ has no singularities at $2\sigma^{2}$ for any $p^{\prime}>1$ . In conclusion, to avoid the singularities at both [math] and $2\sigma^{2}$ , it is sufficient to pick $n$ such that

$\frac{1}{\sqrt{n}}<\sigma^{2}\wedge C=\begin{cases}\sigma^{2}&\text{ if }|a_{1}|\geqslant\frac{1}{\sqrt{2}}\\ C&\text{ if }|a_{1}|\leqslant\frac{1}{\sqrt{2}},\end{cases}$ .

For the other term, by the asymptotic normality of $\sqrt{n}(Q_{n}-\mu)$ , we get as $n\sim+\infty$ and for $p^{\prime}>1$

[TABLE]

which gives the desired result. ∎

6.2 Numerical Results

The table below reports the mean and standard deviation of the proposed estimator $\hat{a}_{n}$ defined in (36) of the true value of the mean-reversion parameter $|a_{1}|$ .

We simulate the values of the estimator $\hat{a}_{n}$ from the quadratic variation $Q_{n}$ for different sample sizes $n$ and for fixed $\sigma^{2}$ chosen to be equal to 1. For each sample size $n$ , the mean and the standard deviation are obtained by 500 replications. The table above confirms that the estimator $\hat{a}_{n}$ is strongly consistent even for small values of $n$ and has small standard deviations for different true values of $|a_{1}|$ . Moreover, the estimator $\hat{a}_{n}$ is more efficient for values of $|a_{1}|$ greater than $0.5$ , this could be explained by the fact that the asymptotic variance of limiting law of $|\hat{a}_{n}|$ is $\frac{(1-a_{1}^{2})(5-4a_{1}^{2})}{2a_{1}^{2}}$ , which is high for small values of $|a_{1}|$ and small for values of $|a_{1}|$ close to 1. Therefore, $\hat{a}_{n}$ is presumably more accurate as an estimator when $|a_{1}|$ is closer to 1, e.g. greater than 0.5 as can be seen in the figure below.

To investigate the asymptotic distribution of $\hat{a}_{n}$ empirically, we need to compare the distribution of the following statistic

[TABLE]

with the standard normal distribution $\mathcal{N}(0,1)$ . For this aim, for parameter choices $\left|a_{1}\right|=0.5$ , $n=3000$ , $\sigma=1$ , and based on 3000 replications, we obtained the following histogram:

This Figure 3 shows that the normal approximation of the distribution of the statistic $\phi(n,a_{1})$ is reasonable even if the sampling size $n$ is not very large. The table below compares statistics of $\phi(n,a_{1})$ and $\mathcal{N}$ (0,1) based on 3000 replications, with $n=3000$ , and $\sigma=1$ . The empirical mean, median and standard deviation of $\phi(n,a_{1})$ match those of $\mathcal{N}$ (0,1) very closely, corroborating our theoretical results.

[TABLE]

We can check more precisely how fast is the statistic $\phi(n,a_{1})$ converges in law to $\mathcal{N}$ (0,1). We chose to compute the Kolmogorov distance between $\phi(n,a_{1})$ and $\mathcal{N}(0,1)$ . For this aim, we approximate the cumulative distribution function using empirical cumulation distribution function based on 500 replications of the computation of $\phi(n,a_{1})$ for $n=3000$ . The next figure shows the empirical and standard normal cumulative distribution functions.

The Kolmogorov distance between the two laws, which equals the sup norm of the difference of these cumulative distribution functions, computes to approx. $0.052$ . On the other hand, since (See for example Theorem 3.3 of [9] for a proof)

[TABLE]

the distance on the left-hand side should be bounded above by $2\times 3000^{-1/4}=0.27$ approx times any constant coming from the upper bound in Proposition 17. This is five times larger than our estimate of the actual Kolmogorov distance $0.052$ , a reassuring practical confirmation of Proposition 17, and of our underlying results on normal asymptotics of 2nd-chaos AR(1) quadratic variations. If that proposition’s upper bound with its rate $n^{-1/2}$ applied directly to the Kolomogorov distance, as is known to be the case for the Berry-Esséen theorem in the classical CLT, the value $0.052$ should be compared to $3000^{-1/2}=0.018$ approx., which is arguably in the same order of magnitude. This is a motivation to investigate whether the so-called delta method which we used here to prove Proposition 17 under the Wasserstein distance, could also apply to the total variation distance, since it is known to be an upper bound on the Kolmogorov distance without the need for the square root as in the comparison (42).

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Azmoodeh, E. and Morlanes, G. I. (2013). Drift parameter estimation for fractional Ornstein-Uhlenbeck process of the second kind. Statistics . DOI: 10.1080/02331888.2013.863888.
2[2] Azmoodeh, E. and Viitasaari, L. (2015). Parameter estimation based on discrete observations of fractional Ornstein-Uhlenbeck process of the second kind. Statist. Infer. Stoch. Proc. 18 , no. 3, 205-227.
3[3] Balakrishna, N. and Shiji, K. (2014). Extreme Value Autoregressive Model and Its Applications, Journal of Statistical Theory and Practice , 8 (3), 460–481.
4[4] Barboza, L.A. and Viens, F. (2017). Parameter estimation of Gaussian stationary processes using the generalized method of moments. Electron. J. Statist. 11 (1), 401-439.
5[5] Belfadli, R., Es-Sebaiy, K. and Ouknine, Y. (2011). Parameter Estimation for Fractional Ornstein-Uhlenbeck Processes: Non-Ergodic Case. Frontiers in Science and Engineering (An International Journal Edited by Hassan II Academy of Science and Technology) . 1 , no. 1, 1-16.
6[6] Biermé, H., Bonami, A., Nourdin, I. and Peccati, G. (2012). Optimal Berry-Esséen rates on the Wiener space: the barrier of third and fourth cumulants. ALEA 9 , no. 2, 473-500.
7[7] Brouste, A. and Iacus, S. M. (2012). Parameter estimation for the discretely observed fractional Ornstein-Uhlenbeck process and the Yuima R package. Comput. Stat. 28 , no. 4, 1529-1547.
8[8] Cheridito, P., Kawaguchi, H. and Maejima, M. (2003). Fractional Ornstein-Uhlenbeck processes, Electr. J. Prob. 8 , 1-14.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

AR(1) processes driven by second-chaos white noise: Berry-Esséen

1 Introduction

1.1 Parameter estimation for stochastic processes: historical and

1.2 A stationary process with second-chaos noise, and related

2 Preliminaries

Lemma 1

3 The model

3.1 Definition

Remark 2

3.2 Quadratic variation

4 Asymptotic variance of the quadratic variation

4.1 Scale constant for T2,nT_{2,n}T2,n​

Proposition 3

Proof.

4.2 Scale constant for T4,nT_{4,n}T4,n​

Proposition 4

Proof.

Remark 5

Theorem 6

Remark 7

5 Berry-Esséen bound for the asymptotic normality of the

Theorem 8

Proof.

Remark 9

Theorem 10

Proof.

Lemma 11

Proof.

Lemma 12

Proof.

Lemma 13

Proof.

6 Application: estimation of the mean-reversion parameter

Proposition 14

Proof.

Remark 15

6.1 Properties of the estimator a^n\hat{a}_{n}a^n​

Proposition 16

Proof.

Proposition 17

Proof.

6.2 Numerical Results

4.1 Scale constant for $T_{2,n}$

4.2 Scale constant for $T_{4,n}$

6.1 Properties of the estimator $\hat{a}_{n}$