Moderate deviations in a class of stable but nearly unstable processes

Fr\'ed\'eric Pro\"ia

arXiv:1905.02618·math.ST·October 17, 2019

Moderate deviations in a class of stable but nearly unstable processes

Fr\'ed\'eric Pro\"ia

PDF

Open Access

TL;DR

This paper develops moderate deviation principles for nearly unstable autoregressive processes, providing insights into the behavior of empirical covariance and OLS estimators as the process approaches instability.

Contribution

It introduces a novel moderate deviation framework for nearly unstable AR processes, including cases with singular asymptotic variance, using truncation and deviation techniques.

Findings

01

Moderate deviation principle for empirical covariance depending on spectral radius

02

Moderate deviation for OLS estimator when asymptotic variance is invertible

03

Deviation results for penalized estimators in singular variance cases

Abstract

We consider a stable but nearly unstable autoregressive process of any order. The bridge between stability and instability is expressed by a time-varying companion matrix $A_{n}$ with spectral radius $ρ (A_{n}) < 1$ satisfying $ρ (A_{n}) \to 1$ . In that framework, we establish a moderate deviation principle for the empirical covariance only relying on the elements of $A_{n}$ through $1 - ρ (A_{n})$ and, as a by-product, we establish a moderate deviation principle for the OLS estimator when $Γ$ , the renormalized asymptotic variance of the process, is invertible. Finally, when $Γ$ is singular, we also provide a compromise in the form of a moderate deviation principle for a penalized version of the estimator. Our proofs essentially rely on truncations and deviations of $m_{n}$ --dependent sequences, with an unbounded rate $(m_{n})$ .

Equations289

X_{n, k} = i = 1 \sum p θ_{n, i} X_{n, k - i} + ε_{k}

X_{n, k} = i = 1 \sum p θ_{n, i} X_{n, k - i} + ε_{k}

Φ_{n, k} = A_{n} Φ_{n, k - 1} + E_{k}

Φ_{n, k} = A_{n} Φ_{n, k - 1} + E_{k}

A_{n} = (θ_{n, 1} θ_{n, 2} I_{p - 1} \dots θ_{n, p} 0)

A_{n} = (θ_{n, 1} θ_{n, 2} I_{p - 1} \dots θ_{n, p} 0)

Φ_{n, k} = ℓ = 0 \sum + \infty A_{n}^{ℓ} E_{k - ℓ}

Φ_{n, k} = ℓ = 0 \sum + \infty A_{n}^{ℓ} E_{k - ℓ}

Γ_{n} = σ^{2} ℓ = 0 \sum + \infty A_{n}^{ℓ} K_{p} (A_{n}^{T})^{ℓ}

Γ_{n} = σ^{2} ℓ = 0 \sum + \infty A_{n}^{ℓ} K_{p} (A_{n}^{T})^{ℓ}

K_{p} = (10 0 0_{p - 1}) and U_{p} = (10)

K_{p} = (10 0 0_{p - 1}) and U_{p} = (10)

Γ_{n} = A_{n} Γ_{n} A_{n}^{T} + σ^{2} K_{p} .

Γ_{n} = A_{n} Γ_{n} A_{n}^{T} + σ^{2} K_{p} .

B_{n} = I_{p^{2}} - A_{n} \otimes A_{n} .

B_{n} = I_{p^{2}} - A_{n} \otimes A_{n} .

\mathbb{E}\big{[}\mathrm{e}^{\alpha\,\varepsilon_{1}^{\,2}}\big{]}<+\infty

\mathbb{E}\big{[}\mathrm{e}^{\alpha\,\varepsilon_{1}^{\,2}}\big{]}<+\infty

n \to + \infty lim A_{n} = A

n \to + \infty lim A_{n} = A

n \to + \infty lim ρ (A_{n}) = ρ (A) = 1.

n \to + \infty lim ρ (A_{n}) = ρ (A) = 1.

n \to + \infty lim \frac{B _{n}^{- 1}}{∣∣∣ B _{n}^{- 1} ∣∣ ∣ _{*}} = H and n \to + \infty lim (1 - ρ (A_{n})) ∣∣∣ B_{n}^{- 1} ∣∣ ∣_{*} = h

n \to + \infty lim \frac{B _{n}^{- 1}}{∣∣∣ B _{n}^{- 1} ∣∣ ∣ _{*}} = H and n \to + \infty lim (1 - ρ (A_{n})) ∣∣∣ B_{n}^{- 1} ∣∣ ∣_{*} = h

n \to + \infty lim b_{n} = + \infty and n \to + \infty lim \frac{n ( 1 - ρ ( A _{n} ) ) ^{\frac{3}{2} + η}}{b _{n}} = + \infty

n \to + \infty lim b_{n} = + \infty and n \to + \infty lim \frac{n ( 1 - ρ ( A _{n} ) ) ^{\frac{3}{2} + η}}{b _{n}} = + \infty

ρ (A_{n}^{ℓ}) = ρ^{ℓ} (A_{n}) ⩽ ∣∣∣ A_{n}^{ℓ} ∣∣∣

ρ (A_{n}^{ℓ}) = ρ^{ℓ} (A_{n}) ⩽ ∣∣∣ A_{n}^{ℓ} ∣∣∣

\frac{1}{1 - ρ ( A _{n} )} ⩽ ℓ = 0 \sum + \infty ∣∣∣ A_{n}^{ℓ} ∣∣∣ = L_{n}

\frac{1}{1 - ρ ( A _{n} )} ⩽ ℓ = 0 \sum + \infty ∣∣∣ A_{n}^{ℓ} ∣∣∣ = L_{n}

\frac{1}{( 1 - ρ ( A _{n} ) ) ^{2}} ⩽ ℓ = 0 \sum + \infty (ℓ + 1) ∣∣∣ A_{n}^{ℓ} ∣∣∣ = M_{n} .

\frac{1}{( 1 - ρ ( A _{n} ) ) ^{2}} ⩽ ℓ = 0 \sum + \infty (ℓ + 1) ∣∣∣ A_{n}^{ℓ} ∣∣∣ = M_{n} .

μ_{n} = ρ (A_{n}) + \frac{1 - ρ ( A _{n} )}{2} = \frac{ρ ( A _{n} ) + 1}{2} .

μ_{n} = ρ (A_{n}) + \frac{1 - ρ ( A _{n} )}{2} = \frac{ρ ( A _{n} ) + 1}{2} .

L_{n} ⩽ \frac{c _{n}}{1 - μ _{n}} < + \infty and M_{n} ⩽ \frac{c _{n}}{( 1 - μ _{n} ) ^{2}} < + \infty.

L_{n} ⩽ \frac{c _{n}}{1 - μ _{n}} < + \infty and M_{n} ⩽ \frac{c _{n}}{( 1 - μ _{n} ) ^{2}} < + \infty.

n \to + \infty lim ∣∣∣ B_{n}^{- 1} ∣∣∣ = n \to + \infty lim L_{n} = n \to + \infty lim M_{n} = + \infty.

n \to + \infty lim ∣∣∣ B_{n}^{- 1} ∣∣∣ = n \to + \infty lim L_{n} = n \to + \infty lim M_{n} = + \infty.

n \to + \infty lim \frac{Γ _{n}}{∣∣∣ B _{n}^{- 1} ∣∣ ∣ _{*}} = Γ

n \to + \infty lim \frac{Γ _{n}}{∣∣∣ B _{n}^{- 1} ∣∣ ∣ _{*}} = Γ

n \to + \infty lim sup \frac{1}{a _{n}} ln P (U_{n} \in F) ⩽ - x \in F in f I (x),

n \to + \infty lim sup \frac{1}{a _{n}} ln P (U_{n} \in F) ⩽ - x \in F in f I (x),

- x \in G in f I (x) ⩽ n \to + \infty lim inf \frac{1}{a _{n}} ln P (U_{n} \in G) .

- x \in G in f I (x) ⩽ n \to + \infty lim inf \frac{1}{a _{n}} ln P (U_{n} \in G) .

n \to + \infty lim \frac{1}{a _{n}} ln P (U_{n} \in H) = - x \in H in f I (x) .

n \to + \infty lim \frac{1}{a _{n}} ln P (U_{n} \in H) = - x \in H in f I (x) .

θ_{n} = S_{n - 1}^{- 1} k = 1 \sum n Φ_{n, k - 1} X_{n, k} where S_{n - 1} = k = 1 \sum n Φ_{n, k - 1} Φ_{n, k - 1}^{T} .

θ_{n} = S_{n - 1}^{- 1} k = 1 \sum n Φ_{n, k - 1} X_{n, k} where S_{n - 1} = k = 1 \sum n Φ_{n, k - 1} Φ_{n, k - 1}^{T} .

\left(\frac{\sqrt{n}\,(1-\rho(A_{n}))^{\frac{3}{2}}}{b_{n}}\,\textnormal{vec}\bigg{(}\frac{1}{n}\sum_{k=1}^{n}(\Phi_{n,\,k}\,\Phi_{n,\,k}^{\,T}-\Gamma_{n})\bigg{)}\right)_{\!n\,\geqslant\,1}

\left(\frac{\sqrt{n}\,(1-\rho(A_{n}))^{\frac{3}{2}}}{b_{n}}\,\textnormal{vec}\bigg{(}\frac{1}{n}\sum_{k=1}^{n}(\Phi_{n,\,k}\,\Phi_{n,\,k}^{\,T}-\Gamma_{n})\bigg{)}\right)_{\!n\,\geqslant\,1}

I_{\Gamma}(x)=\left\{\begin{array}[]{ll}\frac{1}{2\,h^{3}}\,\langle x,\Upsilon^{\,\dagger}\,x\rangle&\mbox{for }x\in\textnormal{Im}(\Upsilon)\\ +\infty&\mbox{otherwise}\end{array}\right.

I_{\Gamma}(x)=\left\{\begin{array}[]{ll}\frac{1}{2\,h^{3}}\,\langle x,\Upsilon^{\,\dagger}\,x\rangle&\mbox{for }x\in\textnormal{Im}(\Upsilon)\\ +\infty&\mbox{otherwise}\end{array}\right.

θ_{n}^{π} = (S_{n - 1}^{π})^{- 1} k = 1 \sum n Φ_{n, k - 1} X_{n, k} where S_{n - 1}^{π} = S_{n - 1} + π n ∣∣∣ B_{n}^{- 1} ∣∣∣ I_{p}

θ_{n}^{π} = (S_{n - 1}^{π})^{- 1} k = 1 \sum n Φ_{n, k - 1} X_{n, k} where S_{n - 1}^{π} = S_{n - 1} + π n ∣∣∣ B_{n}^{- 1} ∣∣∣ I_{p}

Γ_{π} = Γ + π I_{p} and θ_{n}^{π} = (S_{n - 1}^{π})^{- 1} S_{n - 1} θ_{n} .

Γ_{π} = Γ + π I_{p} and θ_{n}^{π} = (S_{n - 1}^{π})^{- 1} S_{n - 1} θ_{n} .

\left(\frac{\sqrt{n}}{b_{n}\,(1-\rho(A_{n}))^{\frac{1}{2}}}\,\big{(}\widehat{\theta}_{n}^{\,\pi}-\theta_{n}^{\,\pi}\big{)}\right)_{\!n\,\geqslant\,1}

\left(\frac{\sqrt{n}}{b_{n}\,(1-\rho(A_{n}))^{\frac{1}{2}}}\,\big{(}\widehat{\theta}_{n}^{\,\pi}-\theta_{n}^{\,\pi}\big{)}\right)_{\!n\,\geqslant\,1}

I_{\theta}^{\,\pi}(x)=\left\{\begin{array}[]{ll}\frac{h}{2\,\sigma^{2}}\,\langle x,\Gamma_{\pi}\,\Gamma^{\,\dagger}\,\Gamma_{\pi}\,x\rangle&\mbox{for }x\in\textnormal{Im}(\Gamma_{\pi}^{-1}\,\Gamma)\\ +\infty&\mbox{otherwise}\end{array}\right.

I_{\theta}^{\,\pi}(x)=\left\{\begin{array}[]{ll}\frac{h}{2\,\sigma^{2}}\,\langle x,\Gamma_{\pi}\,\Gamma^{\,\dagger}\,\Gamma_{\pi}\,x\rangle&\mbox{for }x\in\textnormal{Im}(\Gamma_{\pi}^{-1}\,\Gamma)\\ +\infty&\mbox{otherwise}\end{array}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinancial Risk and Volatility Modeling · Statistical Methods and Inference · Random Matrices and Applications

Full text

Moderate deviations in a class of stable but nearly unstable processes

Frédéric Proïa

Laboratoire angevin de recherche en mathématiques, LAREMA, UMR 6093, CNRS, UNIV Angers, SFR MathSTIC, 2 Bd Lavoisier, 49045 Angers Cedex 01, France.

[email protected]

Abstract.

We consider a stable but nearly unstable autoregressive process of any order. The bridge between stability and instability is expressed by a time-varying companion matrix $A_{n}$ with spectral radius $\rho(A_{n})<1$ satisfying $\rho(A_{n})\rightarrow 1$ . In that framework, we establish a moderate deviation principle for the empirical covariance only relying on the elements of $A_{n}$ through $1-\rho(A_{n})$ and, as a by-product, we establish a moderate deviation principle for the OLS estimator when $\Gamma$ , the renormalized asymptotic variance of the process, is invertible. Finally, when $\Gamma$ is singular, we also provide a compromise in the form of a moderate deviation principle for a penalized version of the estimator. Our proofs essentially rely on truncations and deviations of $m_{n}$ –dependent sequences, with an unbounded rate $(m_{n})$ .

Key words and phrases:

Nearly unstable autoregressive process, Moderate deviation principle, OLS estimation, Asymptotic behavior, Unit root.

1. Introduction and Assumptions

Unit root issues have long been crucial in time series econometrics and have therefore focused a great deal of research studies. This sudden demarcation between stability and instability is responsible for many inference problems in linear time series (see Brockwell and Davis [4] for a detailed overview of the linear stochastic processes). The remarkable works of Chan and Wei [7] encompass, in a much more general context, the now well-known fact that the least squares estimator is $\sqrt{n}$ –consistent with Gaussian behavior when the underlying autoregressive process is stable, whereas it is $n$ –consistent with asymmetrical distribution when the process is unstable. This rather abrupt change in the rate of convergence and in the asymptotic distribution certainly motivated the wide range of unit root testing procedures, but it also paved the way for studies based on time-varying coefficients. In a nearly unstable autoregressive process, we do not focus on a parameter $\theta$ satisfying $|\theta|<1$ or $|\theta|=1$ but, instead, the parameter is considered as a sequence $(\theta_{n})$ such that $|\theta_{n}|<1$ and $|\theta_{n}|\rightarrow 1$ as $n\rightarrow+\infty$ . This sample size dependent structure allows a continuity between stability and instability. For example, Phillips and Magdalinos [20] treat the case where the coefficient is in a $O(\kappa_{n}^{-1})$ neighborhood of the unit root with $\kappa_{n}=n^{\alpha}=o(n)$ . Amongst other results, they prove a central limit theorem for the estimator at the rate $\sqrt{n\,\kappa_{n}}$ , thereby making a bridge between the stable rate $\sqrt{n}$ and the unstable rate $n$ . In the same vein, let us also mention the work of Chan and Wei [6], natural generalizations like the study of Phillips and Lee [19] related to vector autoregressions, or the recent unified theory of Buchmann and Chan [5], focused on nearly unstable autoregressive processes. Our paper is precisely based on the latter topic, in a sense that will be precised in good time.

Given a parametric generating process, the precision of the estimation is usually assessed by its rate of convergence and the deviations can be seen as a natural continuation after a central limit theorem or even a law of iterated logarithm. Roughly speaking, they may be used to estimate the exponential decline of the probability of tail events related to the distance between the estimator and the parameter of interest. We refer to Dembo and Zeitouni [8] regarding the mathematical formalization. Since the 1980s, numerous authors have worked on large and/or moderate deviations in a time series context under many and varied hypotheses. Without claiming to be exhaustive, one can mention the studies of Donsker and Varadhan [10] and Bercu et al. [2] on stationary Gaussian processes and quadratic forms, the paper of Worms [21] on Markov chains and regression models and the one of Bercu [1] on first-order Gaussian stable, unstable and explosive processes. One can also mention the works of Mas and Menneteau [15] on Hilbertian processes, Djellout et al. [9] on non-linear functionals of moving average processes, Wu and Zhao [22] on stationary non-linear processes, Miao and Shen [16] on general autoregressive processes or, more recently, Bitseki Penda et al. [3] on first-order processes with correlated errors. All the references inside may complete this concise list.

In this paper, we investigate the moderate deviations of the estimate in stable but nearly unstable autoregressions. This can be seen as a full generalization of the recent work of Miao, Wang and Yang [17], focused on the univariate case. Our proofs essentially rely on truncations and deviations of $m_{n}$ –dependent sequences where the rate $(m_{n})$ is unbounded. The main technical contributions are twofold. On the one hand, expressing the nearly instability directly through the sequence of spectral radii of the companion matrix seems, to the best of our knowledge, a new approach having many advantages. For example the authors of the recent paper [5] introduce a perturbation in the Jordan canonical form of the model (see Thm. 2.1) which is a powerful idea to deal with the subject of their study, but somehow unnecessarily complex for ours. On the other hand, from a purely technical point of view, unbounded truncations have already been used to get moderate deviations (see e.g. [18] and [17]), but we will see that the vector case treated here and the specific features of the model cannot be adapted as easily to the existing tools. As a consequence, we need to redevelop a full Gärtner-Ellis reasoning to establish the deviations of our unbounded vector truncations. This quite general strategy might inspire future similar studies.

For a fixed $n\geqslant 1$ , let the process be given for some $p\geqslant 1$ and $k\in\{1,\ldots,n\}$ by

[TABLE]

where $(\varepsilon_{k})_{k}$ is a sequence of zero-mean i.i.d. random variables. In an equivalent way, we can consider the vector expression

[TABLE]

where $E_{k}=(\varepsilon_{k},0,\ldots,0)^{\,T}$ is a $p$ –vectorial noise, $\Phi_{n,\,k}=(X_{n,\,k},\ldots,X_{n,\,k-p+1})^{\,T}$ and

[TABLE]

is the $p\times p$ companion matrix of the autoregressive process. If $(E_{k})_{k}$ has a finite variance, it is well-known that $(\Phi_{k,\,n})_{k}$ is a second-order stationary process having the causal form

[TABLE]

when $\rho(A_{n})<1$ , that is, when the largest modulus of its eigenvalues is less than 1 (see e.g. Thm. 11.3.1 of [4] and the fact that each eigenvalue of $A_{n}$ is the inverse of a zero of the autoregressive polynomial of the process). Since $(\varepsilon_{k})_{k}$ is an i.i.d. sequence, the process is strictly stationary with mean zero and variance given by

[TABLE]

where, for convenience, we will denote in the whole study

[TABLE]

the $p\times p$ matrix with 1 at the top left and 0 elsewhere, and its first column standing for the first vector of the canonical basis of $\mathbb{R}^{p}$ . As a consequence of the causal expression above, the initial vector $\Phi_{n,\,0}$ is not arbitrary and has to share the distribution of the process. This also implies the relation

[TABLE]

As will be largely developped throughout the study, $\Gamma_{n}$ is finite for all $n\geqslant 1$ but, as $n$ increases, $|||\Gamma_{n}|||\rightarrow+\infty$ . The keystone matrix $\Gamma$ obtained after a correct standardization of $\Gamma_{n}$ is the renormalized asymptotic variance of the process. Before we start, we define a matrix that will also prove to be crucial to our results,

[TABLE]

We are now going to introduce and comment the hypotheses that will be needed, though not always simultaneously, in the whole paper. Section 2 is devoted to our main results : two statements related to the moderate deviations of the empirical covariance and the OLS estimator, a set of explicit examples and some additional comments and conclusions. Finally, in Section 3 divided into numerous subsections, we will prove all our results, step by step.

*Remark**.*

We denote by $\|\cdot\|$ the Euclidean vector norm and by $|||\cdot|||$ the spectral matrix norm. Other norms may be used, in which case an appropriated subscript is added. Moreover, we will always denote by $\langle\cdot,\cdot\rangle$ the usual inner product of the Euclidean space $\mathbb{R}^{d}$ for any $d\geqslant 1$ . We write $M^{\,\dagger}$ for the Moore-Penrose pseudo-inverse of any matrix $M$ , whose definition and properties may be found in Sec. 0 of [12].

1.1. Hypotheses

First of all, we present the hypotheses that we retain.

(H1)

Gaussian integrability condition. There exists $\alpha>0$ such that

[TABLE]

where $\varepsilon_{1}$ represents the zero-mean i.i.d. sequence $(\varepsilon_{k})_{k}$ of variance $\sigma^{2}>0$ and fourth-order moment $\tau^{4}>0$ . 2. (H2)

Convergence of the companion matrix. There exists a $p\times p$ matrix $A$ such that

[TABLE]

with distinct eigenvalues $0<|\lambda_{p}|\leqslant\ldots\leqslant|\lambda_{1}|=\rho(A)$ , and the top right element of $A$ is non-zero. 3. (H3)

Spectral radius of the companion matrix. For all $n\geqslant 1$ , $\rho(A_{n})<1$ . In addition,

[TABLE] 4. (H4)

Renormalization. We have the convergences

[TABLE]

for some matrix norm, where $H$ is a $p^{2}\times p^{2}$ non-zero matrix and $h>0$ . 5. (H5)

Moderate deviations. The moderate deviations scale $(b_{n})$ satisfies

[TABLE]

for a small $\eta>0$ .

1.2. Comments on the hypotheses

First, conceding in (H2) that the limiting matrix has distinct eigenvalues is a matter of simplication of the reasonings. Indeed, $A_{n}$ turns out to be diagonalizable for a sufficiently large $n$ , and, as a companion matrix, it is well-known that the change of basis is done via a Vandermonde matrix having numerous nice properties (more details are given in Section 3.1, and a discussion on the case of multiple eigenvalues is provided in Section 2.3). The top right element of $A_{n}$ is $\theta_{n,\,p}$ . So, assuming in (H2) that $\theta_{n,\,p}\nrightarrow 0$ ensures that the limit process is still of order $p$ and that 0 cannot be an eigenvalue of $A$ , since $\det(A)=(-1)^{p+1}\,\theta_{p}$ . Moreover, note that, in (H4), the invertibility of $B_{n}$ for all $n$ is guaranteed by (H3). Indeed, $\rho(A_{n}\otimes A_{n})=\rho^{2}(A_{n})<1$ (see e.g. Lem. 5.6.10 and Cor. 5.6.16 of [13]). In addition, we obviously have, for all $\ell\geqslant 0$ ,

[TABLE]

so that we get

[TABLE]

giving a lower bound for $L_{n}$ . Similarly,

[TABLE]

However, an exact upper bound for these sums may be difficult to reach and may require stringent conditions on the elements of $A_{n}$ . We refer the reader to Lemma 3.1 where, under (H2) and (H3), some asymptotic upper bounds are established. We also refer to Section 2.2 where the explicit calculations in terms of some examples shall help to understand the rates involved in the hypotheses. Now for a fixed $n\geqslant 1$ , let

[TABLE]

Clearly, $\rho(A_{n})<\mu_{n}<1$ . Hence, according to Prop. 2.3.15 of [11], for all $n\geqslant 0$ , there exists a constant $c_{n}>0$ such that, for all $\ell\geqslant 0$ , $|||A_{n}^{\ell}|||\,\leqslant\,c_{n}\,\mu_{n}^{\ell}$ so that

[TABLE]

Letting $n$ tend to infinity, it follows from (H3) and (H4) that

[TABLE]

Finally, it will be established in good time that there is a limiting matrix $\Gamma$ such that

[TABLE]

where $|||\cdot|||_{*}$ is the matrix norm of (H4).

*Remark**.*

To facilitate the reading, we consider from now on that the matrix norm $|||\cdot|||_{*}$ is identified in (H4), and we will only note $|||\cdot|||$ in what follows.

2. Main results

This section contains two statements that constitute the main results of the paper. The first of them is quite long to establish and will need numerous technical lemmas, but the second one will essentially be deduced as a corollary of the first one. Subsequently, we provide some explicit examples for a better understanding and an easier interpretation of the hypotheses together with some graphics showing the evolution of the processes and the estimation of the autoregressive parameter. At the end of the section, we discuss the case of multiple eigenvalues. But, first, let us recall the definition of the large and moderate deviation principles (see Sec. 1.2 of [8] for more details). In what follows, a speed is considered as a positive sequence increasing to infinity.

*Definition**.*

A sequence of random variables $(U_{n})_{n}$ on a topological space $(\mathcal{X},\mathcal{B})$ satisfies a large deviation principle (LDP) with speed $(a_{n})$ and rate $I$ if there is a lower semicontinuous mapping $I:\mathcal{X}\rightarrow\bar{\mathbb{R}}^{+}$ such that :

•

for any closed set $F\in\mathcal{B}$ ,

[TABLE]

•

for any open set $G\in\mathcal{B}$ ,

[TABLE]

In particular, if the infimum of $I$ coincides on the interior $H^{\circ}$ and the closure $\bar{H}$ of some $H\in\mathcal{B}$ , then

[TABLE]

*Definition**.*

A sequence of random variables $(V_{n})_{n}$ on a topological space $(\mathcal{X},\mathcal{B})$ satisfies a moderate deviation principle (MDP) with speed $(b_{n}^{\,2})$ and rate $I$ if there is a speed $(v_{n})$ with $\frac{v_{n}}{b_{n}}\rightarrow+\infty$ such that $(\frac{v_{n}}{b_{n}}\,V_{n})_{n}$ satisfies a large deviation principle of speed $(b_{n}^{\,2})$ and rate $I$ .

2.1. Moderate deviations

We now consider an observable trajectory $X_{n,\,-p+1},\ldots,X_{n,\,n}$ for some fixed $n\geqslant 1$ , and use it to provide an estimation of the parameter. It is well-known that the ordinary least squares (OLS) estimator of $\theta_{n}=(\theta_{n,\,1},\ldots,\theta_{n,\,p})^{\,T}$ is given by

[TABLE]

The first result is dedicated to the empirical variance $\frac{S_{n}}{n}$ .

Theorem 2.1.

Under hypotheses (H1)–(H5), the sequence

[TABLE]

satisfies an LDP with speed $(b_{n}^{\,2})$ and a rate function $I_{\Gamma}:\mathbb{R}^{p^{2}}\rightarrow\bar{\mathbb{R}}^{+}$ defined as

[TABLE]

where $\Upsilon$ is explicitely given in (3.18) and $h$ comes from (H4).

Proof.

See Section 3.2.5. ∎

*Remark**.*

Through vectorization, this MDP is established on $\mathbb{R}^{p^{2}}$ in order to avoid any confusion in the notations, but we might work in $\mathbb{R}^{p\times p}$ as well. The associated rate function would only require a slight modification of the proof.

*Remark**.*

To be punctilious, we may add a small $\epsilon>0$ to the diagonal of $S_{n-1}$ to ensure that it is non-sigular for all $n\geqslant 1$ without disturbing the asymptotic behavior.

When the variance $\Gamma$ given in (1.11) is invertible, we establish the MDP for the OLS in the theorem that follows. However, when it is not the case, there are some technical complications and, to reach an intermediate result, we need to introduce a penalized version of the OLS. For a small $\pi\geqslant 0$ , define

[TABLE]

with possibly $\pi=0$ if $\Gamma$ is invertible, in which case it is clearly the standard OLS given above, but necessarily $\pi>0$ otherwise. Consider also the penalized version of the variance and the corrected parameter

[TABLE]

By construction, $\Gamma$ is, at worst, non-negative definite and for $\pi>0$ , $\Gamma_{\pi}$ turns out to be invertible. The same goes for $S_{n-1}^{\,\pi}$ .

Corollary 2.2.

Under hypotheses (H1)–(H5), for all $\pi>0$ , the sequence

[TABLE]

satisfies an LDP with speed $(b_{n}^{\,2})$ and a rate function $I_{\theta}^{\,\pi}:\mathbb{R}^{p}\rightarrow\bar{\mathbb{R}}^{+}$ defined as

[TABLE]

where the variance $\Gamma$ is given in (1.11), $\Gamma_{\pi}$ is the penalized variance given in (2.3) and $h$ comes from (H4), respectively. If in addition $\Gamma$ is invertible, then the sequence

[TABLE]

satisfies an LDP with speed $(b_{n}^{\,2})$ and a rate function $I_{\theta}:\mathbb{R}^{p}\rightarrow\mathbb{R}^{+}$ defined as

[TABLE]

Proof.

See Section 3.2.6. ∎

To sum up, this result shows that, when $\Gamma$ is invertible, the OLS satisfies an MDP, and even when $\Gamma$ is singular, one may reach a compromise by getting an MDP for a penalized estimation. In the same vein, notice also that, in the invertible case,

[TABLE]

*Remark**.*

In the stable case where $\rho(A_{n})=\rho(A)<1$ , we simply have $(1-\rho(A_{n}))\,|||B^{-1}_{n}|||=h$ and $\Gamma_{n}\,|||B_{n}^{-1}|||^{-1}=\Gamma$ for all $n\geqslant 1$ . By contraction, the MDP of Corollary 2.2 coincides with the one of Thm. 3 of [21] when $\Gamma$ is invertible.

2.2. Some explicit examples

Before giving some examples, we can already note that (H5) implies $\sqrt{n}\,(1-\rho(A_{n}))\rightarrow+\infty$ . Thus, necessarily, the convergence $1-\rho(A_{n})\rightarrow 0$ cannot occur with an exponential rate, this is the reason why we focus on polynomial rates of the form $1-\rho(A_{n})=c\,n^{-\alpha}$ for some $c>0$ in this section. Accordingly, in all the examples below, (H5) is only possible when $0<\alpha<\frac{1}{3+2\eta}<\frac{1}{3}$ . Thus, one cannot expect a sequence of coefficients moving too fast toward instability. The domain of validify of the speed of the MDP will be

[TABLE]

2.2.1. Univariate case with one nearly unit root

Suppose that $p=1$ . Then, (H2) and (H3) imply that $|\theta_{n}|<1$ and $\theta_{n}\rightarrow\pm 1$ . We also have $B_{n}=1-\theta_{n}^{\,2}$ and (H4) can be expressed like

[TABLE]

A straightforward calculation shows that

[TABLE]

so that we can choose $\pi=0$ . The standard cases, illustrated on Figure 1, are $\theta_{n}=1-c_{1}\,n^{-\alpha}$ for the positive unit root and $\theta_{n}=-1+c_{2}\,n^{-\alpha}$ for the negative unit root, with $c_{1},c_{2}>0$ and $\alpha>0$ . The rate function associated with Corollary 2.2 is $I_{\theta}(x)=\frac{x^{2}}{4}$ , which corresponds to Prop. 2.1 of [17]. Indeed, their rate $x\mapsto\frac{x^{2}}{2}$ is associated to an LDP with the renormalization $(1-\theta_{n}^{\,2})^{\frac{1}{2}}$ whereas our normalization is $(1-|\theta_{n}|)^{\frac{1}{2}}$ . By contraction, the asymptotic factor $\sqrt{2}$ explains the difference.

2.2.2. Bivariate case with one nearly unit root

Suppose now that $p=2$ and $\textnormal{sp}(A)=\{\pm 1,\lambda\}$ with $|\lambda|<1$ . This situation occurs, for example, when

[TABLE]

whose eigenvalues are $1-c\,n^{-\alpha}$ and $\lambda$ . This is illustrated on Figure 2. For $c>0$ and $\alpha>0$ , (H2) and (H3) are satisfied. The direct calculation gives

[TABLE]

whence we obtain

[TABLE]

so (H4) is satisfied with the 1–norm. The choice $\pi=0$ is impossible, and we finally find

[TABLE]

2.2.3. Bivariate case with two nearly unit roots

Following the same lines, suppose that $p=2$ and $\textnormal{sp}(A)=\{-1,1\}$ . This situation occurs, for example, when

[TABLE]

whose eigenvalues are $1-c_{1}\,n^{-\alpha}$ and $-1+c_{2}\,n^{-\alpha}$ . This is illustrated on Figure 3. For $c_{1},c_{2}>0$ and $\alpha>0$ , (H2) and (H3) are satisfied. The direct calculation gives

[TABLE]

whence we obtain

[TABLE]

Moreover,

[TABLE]

so (H4) is satisfied with the 1–norm. The choice $\pi=0$ is possible and we finally find

[TABLE]

2.3. Discussion on multiple eigenvalues and conclusion

As we will see in the proof of Lemma 3.1, the distinct eigenvalues assumption (H2) is sufficient to reach our results. However, a less stringent formulation of (H2) could be :

(H ${}^{\,\prime}_{2}$ )

Convergence of the companion matrix. There exists a $p\times p$ matrix $A$ such that

[TABLE]

and the top right element of $A$ is non-zero. In addition, there exists a rank $n_{0}$ such that, for all $n>n_{0}$ , $A_{n}$ is diagonalizable and the change of basis matrix $P_{n}$ satisfies $|||P_{n}|||\leqslant C_{st}$ and $|||P_{n}^{\,-1}|||\leqslant C_{st}$ .

In general, multiple eigenvalues may not falsify our reasonings, except when the multiplicity concerns the eigenvalues whose modulus tends to 1. Indeed, the coefficients of $|||A_{n}^{\ell}|||$ may grow faster in that case. Consider the simple bivariate example where

[TABLE]

Then, it is not hard to solve this linear difference equation whose characteristic roots are the eigenvalues of $A_{n}$ . In case of multiplicity, the top left term takes the form of

[TABLE]

and even if $|c_{n}|\leqslant C_{st}$ and $|d_{n}|\leqslant C_{st}$ for $n$ large enough, it follows that

[TABLE]

That invalidates all our reasonings and, in that case, new approaches are needed to potentially reach the moderate deviations. From our viewpoint, this is the main weakness of the set of hypotheses. As it is already observed in [7], multiple unit roots located at 1 influence the rate of convergence of the OLS. We conjecture that the same phenomenon occurs here and that a larger power should come with $1-\rho(A_{n})$ in the renormalization.

To sum up, this study is a wide generalization of [17] and, although not complete in virtue of the latter remark, it covers most of the MDP issues for the estimation in the stable but nearly unstable case. Large deviations would undoubtedly be a very useful and challenging study to carry out, naturally extending this one. However, to the best of our knowledge, it is not even entirely treated in the stable time-invariant case $\rho(A_{n})=\rho(A)<1$ , clearly revealing the complexity of the problem. A complicated but stimulating trail for future studies could rely on the exponential, and not only polynomial, neighborhood of the unit root. Along the same lines and even if it is of less practical interest, we might as well focus on the explosive side of the unit roots, where new theoretical developments are necessary.

3. Technical proofs

In all the proofs, $C_{st}$ denotes a generic positive constant that is not necessarily identical from one line to another. We will frequently use the fact that $\|\textnormal{vec}(\cdot)\|=|||\cdot|||_{F}\leqslant C_{st}\,|||\cdot|||$ . For asymptotic equivalences, $f_{n}\asymp g_{n}$ means that both $f_{n}=O(g_{n})$ and $g_{n}=O(f_{n})$ whereas $f_{n}\sim g_{n}$ stands for $\frac{f_{n}}{g_{n}}\rightarrow 1$ .

3.1. Some linear algebra tools

Thereafter, we denote by $\lambda_{1},\ldots,\lambda_{p}$ the (distinct) eigenvalues of $A$ and $\lambda_{n,\,1},\ldots,\lambda_{n,\,p}$ those of $A_{n}$ , in descending order of modulus. We start by establishing two lemmas that will prove to be very useful in what follows.

Lemma 3.1.

Under hypotheses (H2) and (H3), as $n$ tends to infinity,

[TABLE]

Proof.

The lower bounds are established in Section 1.2, in (1.8) and (1.9) precisely. For the upper bounds, fix

[TABLE]

According to Thm. 2.4.9.2 of [13], (H2) implies the existence of a rank $n_{0}=n_{0}(\delta,\epsilon_{1},\epsilon_{2})$ such that, for all $n>n_{0}$ , the eigenvalues of $A_{n}$ satisfy

[TABLE]

and

[TABLE]

Let $P_{n}$ be a change of basis matrix in the diagonalization of $A_{n}$ . Then, since $A_{n}$ is a companion matrix, a standard choice would be

[TABLE]

This Vandermonde matrix is invertible if and only if $\lambda_{n,\,i}\neq\lambda_{n,\,j}$ for all $i\neq j$ (see e.g. Sec. 0.9.11 of [13]). In that case, $P_{n}^{\,-1}$ is closely related to the Lagrange interpolating polynomials given, for $i\in\{1,\ldots,p\}$ , by

[TABLE]

Precisely, the $i$ –th row of $P_{n}^{\,-1}$ contains the coefficients of $L_{i}(X)$ in the basis $(1,X,\ldots,X^{p-1})$ of $\mathbb{R}_{p-1}[X]$ , i.e.

[TABLE]

where the relation $\prod_{j\,\neq\,i}(X-\frac{1}{\lambda_{n,\,j}})=p_{n,\,i,\,1}+p_{n,\,i,\,2}\,X+\ldots+p_{n,\,i,\,p}\,X^{p-1}$ enables to identify each $p_{n,\,i,\,j}$ . Combining (3.1) and (3.2), it follows that, for all $n>n_{0}$ ,

[TABLE]

We also have $|||P_{n}^{\,-1}|||_{1}\leqslant C_{st}$ since $\epsilon_{1}^{\,p-1}<\prod_{j\,\neq\,i}|\frac{1}{\lambda_{n,\,i}}-\frac{1}{\lambda_{n,\,j}}|<\epsilon_{2}^{\,p-1}$ and since $p_{n,\,i,\,j}$ is a finite combination of sums and products of $\frac{1}{\lambda_{n,\,1}},\ldots,\frac{1}{\lambda_{n,\,p}}$ . To sum up, for all $\ell\geqslant 0$ and $n>n_{0}$ ,

[TABLE]

Consequently,

[TABLE]

It only remains to sum over $\ell$ and to let $n$ tend to infinity to reach the first result. Similarly,

[TABLE]

so we get the second result by following the same lines. ∎

Lemma 3.2.

Under hypotheses (H2) and (H3), we have the convergence

[TABLE]

for any rate $(w_{n})$ satisfying $w_{n}\,(1-\rho(A_{n}))\rightarrow+\infty$ .

Proof.

Consider the rank $n_{0}$ introduced in the proof of Lemma 3.1. Then, according to the inequality (3.5),

[TABLE]

where the invertible and uniformly bounded matrices $P_{n}$ and $P_{n}^{\,-1}$ are given in (3.3) and (3.4), respectively. We also have

[TABLE]

from the hypothesis on $(w_{n})$ . It remains to let $n$ tend to infinity in the above inequality. ∎

3.2. Proofs of the main results

First of all, it is convenient to express the empirical variance of the process as

[TABLE]

where the variance $\Gamma_{n}$ is given in (1.4),

[TABLE]

and the residual term is

[TABLE]

Then, solving this generalized Sylvester equation (Lem. 2.1 of [14]) and considering the invertibility of $B_{n}$ in (1.7) which is proved at the beggining of Section 1.2, we reach the decomposition

[TABLE]

Let us now reason step by step, via some intermediate results.

3.2.1. Exponential moments of the squared initial value

We recall that, from the causal form (1.3) of the process,

[TABLE]

The following result gives an exponential moment for the correctly renormalized squared initial value.

Lemma 3.3.

Under hypothesis (H1),

[TABLE]

where $L_{n}$ is given in (1.8).

Proof.

By Cauchy-Schwarz inequality,

[TABLE]

Moreover, from Jensen’s inequality, for all $\lambda>0$ ,

[TABLE]

using $\frac{|||A_{n}^{0}|||}{L_{n}}+\frac{|||A_{n}^{1}|||}{L_{n}}+\ldots=1$ . Taking the expectation and choosing $\lambda=\alpha$ given in (H1), we deduce that

[TABLE]

∎

3.2.2. Exponential convergence of the residual term

The residual term in the decomposition (3.9) is given by

[TABLE]

Our next objective is to prove the exponential negligibility of this residual.

Lemma 3.4.

Under hypotheses (H1)–(H5), for all $r>0$ ,

[TABLE]

Proof.

First, note that

[TABLE]

Thus,

[TABLE]

where $L_{n}$ is given in (1.8), using Markov’s inequality, the reasoning in the proof of Lemma 3.3 and the fact that, from the strict stationarity of the process, $\Phi_{n,\,0}\,\Phi_{n,\,0}^{\,T}$ and $\Phi_{n,\,n}\,\Phi_{n,\,n}^{\,T}$ share the same distribution. Hence, for a sufficiently large $n$ ,

[TABLE]

since $|||B_{n}^{-1}|||^{\frac{1}{2}}\sim\sqrt{h}\,(1-\rho(A_{n}))^{-\frac{1}{2}}$ from (H4), $L_{n}^{2}=O((1-\rho(A_{n}))^{-2})$ from Lemma 3.1 and since, from (H2), $|||A_{n}|||$ converges. Finally, letting $n$ tend to infinity, (H1) and (H5) conclude the proof. ∎

3.2.3. The truncated sequence

In what follows, we define the rate

[TABLE]

and we note from (H3)–(H5) that

[TABLE]

Following the idea of [17], we are going to use $m_{n}$ as a truncation parameter. Consider

[TABLE]

as an approximation of $\Phi_{n,\,k}$ in its causal form (1.3). We also define the truncated version of the summands $\Delta_{n,\,k}$ in (3.2) as

[TABLE]

The process $(B_{n}^{-1}\,\textnormal{vec}(\zeta_{n,\,k}))_{k}$ is strictly stationary and $m_{n}$ –dependent, according to Def. 6.4.3 of [4]. Let us study some properties of this process.

Lemma 3.5.

Under hypotheses (H1)–(H4), we can find a constant $c_{\alpha}>0$ such that, for a sufficiently large $n$ ,

[TABLE]

for any rate $(w_{n})$ satisfying $w_{n}\,(1-\rho(A_{n}))\rightarrow+\infty$ .

Proof.

By Hölder’s inequality,

[TABLE]

Moreover, for the rank $n_{0}$ and the uniformly bounded matrices $P_{n}$ and $P_{n}^{\,-1}$ introduced in the proof of Lemma 3.1,

[TABLE]

as soon as $w_{n}>n_{0}$ . Thus,

[TABLE]

Finally, (H4), (1.10) and (3.7) lead, for large values of $n$ , to

[TABLE]

It remains to choose $c_{\alpha}=\frac{\alpha}{C_{st}}$ . ∎

Lemma 3.6.

Under hypotheses (H2)–(H4), for all $n\geqslant 1$ and $k\in\{1,\ldots,n\}$ ,

[TABLE]

where the $p^{2}\times p^{2}$ covariance $\Upsilon_{n}$ can be explicitely built in terms of $\sigma^{2}$ , $A_{n}$ and $B_{n}$ . In addition,

[TABLE]

where the non-zero limiting matrix $\Upsilon$ is given in (3.18).

Proof.

We will use in what follows $K_{p}$ and $U_{p}$ defined in (1.5). Let $\mathcal{F}_{k}=\sigma(\varepsilon_{\ell},\,\ell\leqslant k)$ be the $\sigma$ –algebra of the events occurring up to time $k$ . Then, it is easy to see that

[TABLE]

in virtue of (1.6). For $k>j$ , by direct calculation,

[TABLE]

and the same is true for $j>k$ since $(\mathbb{E}[\textnormal{vec}(\zeta_{n,\,k})\,\textnormal{vec}^{\,T}(\zeta_{n,\,j})])^{\,T}=\mathbb{E}[\textnormal{vec}(\zeta_{n,\,j})\,\textnormal{vec}^{\,T}(\zeta_{n,\,k})]=0$ . Now for $k=j$ , a tedious but straightforward calculation leads to

[TABLE]

To give an explicit expression of $\Upsilon_{n}$ , it suffices to observe that the truncated expression (3.14) has a variance given by

[TABLE]

so that

[TABLE]

Let us now look at the asymptotic behavior of $\Upsilon_{n}$ correctly renormalized. First, we have the convergence

[TABLE]

coming from the identity $(A_{n}\otimes A_{n})^{m_{n}-1}=A_{n}^{m_{n}-1}\otimes A_{n}^{m_{n}-1}$ and Lemma 3.2. Together with (H4), this implies

[TABLE]

In the end of the proof, we call $\textnormal{vec}^{-1}$ the vectorization inverse operator (namely, in our context, the reconstruction of a $p\times p$ matrix from its vectorization of size $p^{2}$ ). Then,

[TABLE]

Combining (3.16) with (3.17) and (H4), we have

[TABLE]

where $\Gamma^{A}=A\,\Gamma\,A^{\,T}$ . ∎

*Remark**.*

As a by-product, we also obtain, following the same lines,

[TABLE]

where $\Gamma_{n}$ is given in (1.4), which proves (1.11). The variance $\Gamma_{n,\,m_{n}}$ defined above may be seen as the truncated version of $\Gamma_{n}$ .

3.2.4. The remainder of the truncation

We denote by

[TABLE]

the remainder of the truncation of $\Delta_{n}$ in (3.2) made via (3.15). Our last preliminary objective is to establish the following lemma.

Lemma 3.7.

Under hypotheses (H1)–(H5), for all $r>0$ ,

[TABLE]

Proof.

Clearly, both terms in the definition of (3.19) are similar and we will only work on the first one. From the causal expression (1.3) and the truncation (3.14), we note that

[TABLE]

Thus, with $M_{n}$ given in (1.9) and applying Lem. 17 of [15] under (H1),

[TABLE]

for some $\alpha_{0}>0$ and $\beta_{0}>0$ , where

[TABLE]

Our choice of $m_{n}$ in (1.9), the properties of Lemma 3.1, (3.6) and our hypotheses on the rates of convergence lead, for $n$ large enough, to

[TABLE]

and obviously $t_{n,\,\ell}\rightarrow+\infty$ . Hence, like in formula (3.11) of [17], there are some constants $\alpha_{0}^{\,\prime}>0$ and $\beta_{0}^{\,\prime}>0$ such that, for all $\ell\geqslant 0$ and large values of $n$ ,

[TABLE]

Going back to (3.2.4),

[TABLE]

where, for convenience, we note

[TABLE]

To sum up,

[TABLE]

This is clearly sufficient to finish the proof since, from (H4),

[TABLE]

for $n$ large enough. ∎

We are now ready to prove Theorem 2.1 and Corollary 2.2.

3.2.5. Proof of Theorem 2.1

All the technical results of the previous sections are now going to be concretely used. Consider the sequence

[TABLE]

where $\zeta_{n,\,k}$ is given in (3.15). The process $(\xi_{n,\,k})_{k}$ is also strictly stationary and $m_{n}$ –dependent. Like in [18] or [17, suppl. mat.], let us extract an independent sequence from this process. For $j\in\{1,\ldots,j_{n}\}$ , define

[TABLE]

where $j_{n}=\lfloor\frac{n}{m_{n}}\rfloor$ and where $(m_{n})$ and its properties are given in (3.12). Then, $(\xi^{\,\prime}_{n,\,j})_{j}$ is strictly stationary and $1$ –dependent. Next, for $t\in\{1,\ldots,t_{n}\}$ , define

[TABLE]

where $t_{n}=\lfloor\frac{j_{n}}{u_{n}}\rfloor$ and $(u_{n})$ is another rate satisfying

[TABLE]

To be convinced that such a rate exists, one can use (3.13) and the fact that $|\ln f_{n}|\rightarrow+\infty$ and $f_{n}\,|\ln f_{n}|^{a}\rightarrow 0$ when $f_{n}\rightarrow 0$ . The process $(\xi^{\,\prime\prime}_{n,\,t})_{t}$ is now i.i.d. and the rates satisfy

[TABLE]

The reasoning of [17, suppl. mat.] does not suit us, so we need to reformulate the establishment of the MDP. First, by a Taylor-Lagrange expansion,

[TABLE]

in which the remainder term satisfies, for any $\alpha>0$ ,

[TABLE]

Now, the random variables $|||\zeta_{n,\,\ell}|||$ sharing the same distribution for all $\ell\geqslant 0$ , it follows from Hölder’s inequality that,

[TABLE]

for $n$ large enough, using Lemma 3.5 with $m_{n}\,(1-\rho(A_{n}))\rightarrow+\infty$ stemming from (3.13), the convergence of $|||A_{n}|||$ , (H1) and treating all the terms of (3.15) similarly. Taking the expectation in (3.24) and exploiting the independence of the zero-mean process $(\xi^{\,\prime\prime}_{n,\,t})_{t}$ , we obtain the decomposition

[TABLE]

for we can see, as it is done in [18], that the residual term

[TABLE]

plays a negligible role in comparison to the main one. To eliminate the third-order term, we first look at the fourth-order moment of $\langle\lambda,\,\xi^{\,\prime\prime}_{n,\,1}\rangle$ , that is

[TABLE]

A long but standard calculation shows that

[TABLE]

as $n$ tends to infinity. This result is reached using the strict stationarity of the process, the explicit expression of $X_{n,\,0}^{4}$ in terms of $A_{n}^{\ell}$ , the inequality (3.6) and, finally, using (H4) giving the equivalence between $(1-\rho(A_{n}))^{-2}$ and $C_{st}\,|||B_{n}^{-1}|||^{2}$ . So,

[TABLE]

By Lyapunov’s inequality,

[TABLE]

for a small $\delta>0$ . Now, combining this result with (3.25) and Hölder’s inequality, for sufficiently large values of $n$ ,

[TABLE]

by (3.25), (3.23) and the properties in (3.22). The second-order term in (3.26) satisfies

[TABLE]

where we used (3.23) and the results of Lemma 3.6. The combination of (3.26), (3.27) and (3.28) together with the Gärtner-Ellis theorem (see e.g. Sec. 2.3 of [8]) shows that the sequence

[TABLE]

satisfies an LDP with speed $(b_{n}^{\,2})$ and rate function given by the Fenchel-Legendre transform of the above logarithmic moment generating function, i.e.

[TABLE]

Note that, due to its particular structure, $\Upsilon$ is only non-negative definite as soon as $p>1$ (by way of example, its last row and column are zero). In that case (see e.g. Ex. 1.1.4 of [12], page 212), the explicit expression of this quadratic rate function, strictly convex on its relative interior, is

[TABLE]

After the truncation introduced in (3.14), the decomposition (3.9) can be rewritten as

[TABLE]

where, in the remainder term $R^{\,*}_{n}=B_{n}^{-1}\,\textnormal{vec}(\Lambda_{n})-R_{n}$ , the residual of the truncation is given in (3.19) and the main residual $R_{n}$ is given in (3.11). Lemma 3.4 and Lemma 3.7 show that the first term in the right-hand is an exponentially good approximation of the left-hand side and that, as a consequence, they share the same LDP (see Def. 4.2.10 and Thm. 4.2.13 of [8]). The contraction principle (see Thm. 4.2.1 of [8]) enables to compute the rate function associated with the LDP, namely

[TABLE]

where the limiting value $h>0$ comes from (H4). ∎

3.2.6. Proof of Corollary 2.2

Using (2.2) and (2.3),

[TABLE]

Our objective is first to prove that, for all $r>0$ ,

[TABLE]

where $\Gamma_{\pi}$ is the invertible penalized variance (2.3), and then to establish an LDP for the sequence

[TABLE]

in order to obtain the announced result, via the contraction principle (Thm. 4.2.1 of [8]). On the one hand, we know from Theorem 2.1 and (3.29) that

[TABLE]

since, by (H4) and (H5),

[TABLE]

and $(1-\rho(A_{n}))^{\frac{3}{2}}\sim h^{\frac{3}{2}}\,|||B_{n}^{-1}|||^{-\frac{3}{2}}$ . So,

[TABLE]

It is also clear that

[TABLE]

and (1.11) shows that the second event in the right-hand side becomes impossible when $n$ increases. Hence, from the reasoning above,

[TABLE]

Now we shall use Lem. 2 of [21] to get (3.30).

On the other hand, all the work consisting in proving that the sequence (3.31) satisfies an LDP with speed $(b_{n}^{\,2})$ has already been done in the proof of Theorem 2.1. Indeed, via the truncation (3.14),

[TABLE]

where the process $(Z_{n,\,k})_{k}$ forms a strictly stationary and $m_{n}$ –dependent sequence. However, apart from the renormalization, this is precisely the first column of the first term of (3.15). Thus, the calculations are similar and we find, like in Lemma 3.6,

[TABLE]

In that case, from the convergence (3.17) and the previous proof, the rate function associated with the LDP is given by

[TABLE]

The exponential negligibility of the remainder of the truncation is obtained by following the lines of Lemma 3.7. The contraction principle enables to compute the rate function associated with the LDP, namely

[TABLE]

where the exponential convergence (3.30) has been combined to the LDP established on the sequence (3.31). ∎

Acknowledgements. The author thanks the associate editor and the two anonymous reviewers for the numerous comments and suggestions that clearly helped to improve the paper. He also thanks R. Garbit for the constructive discussion about the link between Vandermonde matrices and Lagrange polynomials.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bercu, B. On large deviations in the Gaussian autoregressive process: stable, unstable and explosive cases. Bernoulli. 7 (2001), 299–316.
2[2] Bercu, B., Gamboa, F., and Rouault, A. Large deviations for quadratic forms of stationary Gaussian processes. Stoch. Proc. Appl. 71 (1997), 75–90.
3[3] Bitseki Penda, V., Djellout, H., and Proïa, F. Moderate deviations for the Durbin-Watson statistic related to the first-order autoregressive process. ESAIM Probab. Stat. 18 (2014), 308–331.
4[4] Brockwell, P. J., and Davis, R. A. Time series: Theory and Methods (Second Edition) . Springer Series in Statistics. Springer, New York, 1991.
5[5] Buchmann, B., and Chan, N. H. Unified asymptotic theory for nearly unstable AR ( p ) 𝑝 (p) processes. Stoch. Proc. Appl. 123 (2013), 952–985.
6[6] Chan, N. H., and Wei, C. Z. Asymptotic inference for nearly nonstationary AR ( 1 ) 1 (1) processes. Ann. Stat. 15 (1987), 1050–1063.
7[7] Chan, N. H., and Wei, C. Z. Limiting distributions of least squares estimates of unstable autoregressive processes. Ann. Statist. 16 (1988), 367–401.
8[8] Dembo, A., and Zeitouni, O. Large Deviations Techniques and Applications (Second Edition) , vol. 38 of Applications of Mathematics . Springer, 1998.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Moderate deviations in a class of stable but nearly unstable processes

Abstract.

Key words and phrases:

1. Introduction and Assumptions

Remark*.*

1.1. Hypotheses

1.2. Comments on the hypotheses

Remark*.*

2. Main results

Definition*.*

Definition*.*

2.1. Moderate deviations

Theorem 2.1**.**

Proof.

Remark*.*

Remark*.*

Corollary 2.2**.**

Proof.

Remark*.*

2.2. Some explicit examples

2.2.1. Univariate case with one nearly unit root

2.2.2. Bivariate case with one nearly unit root

2.2.3. Bivariate case with two nearly unit roots

2.3. Discussion on multiple eigenvalues and conclusion

3. Technical proofs

3.1. Some linear algebra tools

Lemma 3.1**.**

Proof.

Lemma 3.2**.**

Proof.

3.2. Proofs of the main results

3.2.1. Exponential moments of the squared initial value

Lemma 3.3**.**

Proof.

3.2.2. Exponential convergence of the residual term

Lemma 3.4**.**

Proof.

3.2.3. The truncated sequence

Lemma 3.5**.**

Proof.

Lemma 3.6**.**

Proof.

Remark*.*

3.2.4. The remainder of the truncation

Lemma 3.7**.**

Proof.

3.2.5. Proof of Theorem 2.1

3.2.6. Proof of Corollary 2.2

*Remark**.*

*Remark**.*

*Definition**.*

*Definition**.*

Theorem 2.1.

*Remark**.*

*Remark**.*

Corollary 2.2.

*Remark**.*

Lemma 3.1.

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

Lemma 3.5.

Lemma 3.6.

*Remark**.*

Lemma 3.7.