Cross validation for locally stationary processes

Stefan Richter; Rainer Dahlhaus

arXiv:1705.10046·math.ST·May 30, 2017

Cross validation for locally stationary processes

Stefan Richter, Rainer Dahlhaus

PDF

TL;DR

This paper introduces an adaptive cross-validation method for selecting bandwidths in local M-estimators applied to locally stationary processes, demonstrating asymptotic optimality and practical effectiveness through simulations.

Contribution

It presents a novel cross-validation approach for bandwidth selection in locally stationary processes, with proven asymptotic optimality and broad applicability.

Findings

01

Method achieves asymptotic optimality under mild conditions

02

Works well even in misspecified models

03

Applicable to both linear and nonlinear processes

Abstract

We propose an adaptive bandwidth selector via cross validation for local M-estimators in locally stationary processes. We prove asymptotic optimality of the procedure under mild conditions on the underlying parameter curves. The results are applicable to a wide range of locally stationary processes such linear and nonlinear processes. A simulation study shows that the method works fairly well also in misspecified situations.

Equations643

\|\tilde{X}_{t}(\theta)-\tilde{X}_{t}(\theta^{\prime})\|_{q}\leq C_{A}|\theta-\theta^{\prime}|_{1},\quad\quad\sum_{t=1}^{n}\big{\|}X_{t,n}-\tilde{X}_{t}\big{(}\theta_{0}\big{(}\frac{t}{n}\big{)}\big{)}\big{\|}_{q}\leq C_{B},

\|\tilde{X}_{t}(\theta)-\tilde{X}_{t}(\theta^{\prime})\|_{q}\leq C_{A}|\theta-\theta^{\prime}|_{1},\quad\quad\sum_{t=1}^{n}\big{\|}X_{t,n}-\tilde{X}_{t}\big{(}\theta_{0}\big{(}\frac{t}{n}\big{)}\big{)}\big{\|}_{q}\leq C_{B},

D_{q} := max {θ \in Θ sup ∥ \tilde{X}_{0} (θ) ∥_{q}, n \in N sup t = 1, ..., n sup ∥ X_{t, n} ∥_{q}} < \infty.

D_{q} := max {θ \in Θ sup ∥ \tilde{X}_{0} (θ) ∥_{q}, n \in N sup t = 1, ..., n sup ∥ X_{t, n} ∥_{q}} < \infty.

\sum_{i=0}^{r}a_{i}\big{(}\frac{t}{n}\big{)}X_{t,n}=\sum_{j=0}^{s}b_{j}\big{(}\frac{t}{n}\big{)}\sigma\big{(}\frac{t}{n}\big{)}\varepsilon_{t}.

\sum_{i=0}^{r}a_{i}\big{(}\frac{t}{n}\big{)}X_{t,n}=\sum_{j=0}^{s}b_{j}\big{(}\frac{t}{n}\big{)}\sigma\big{(}\frac{t}{n}\big{)}\varepsilon_{t}.

X_{t,n}=\big{(}a_{0}\big{(}\frac{t}{n}\big{)}+a_{1}\big{(}\frac{t}{n}\big{)}X_{t-1,n}^{2}+...+a_{r}\big{(}\frac{t}{n}\big{)}X_{t-r,n}^{2}\big{)}^{1/2}\varepsilon_{t}.

X_{t,n}=\big{(}a_{0}\big{(}\frac{t}{n}\big{)}+a_{1}\big{(}\frac{t}{n}\big{)}X_{t-1,n}^{2}+...+a_{r}\big{(}\frac{t}{n}\big{)}X_{t-r,n}^{2}\big{)}^{1/2}\varepsilon_{t}.

X_{t,n}=a_{1}\big{(}\frac{t}{n}\big{)}X_{t-1,n}^{+}+a_{2}\big{(}\frac{t}{n}\big{)}X_{t-1,n}^{-}+\varepsilon_{t},

X_{t,n}=a_{1}\big{(}\frac{t}{n}\big{)}X_{t-1,n}^{+}+a_{2}\big{(}\frac{t}{n}\big{)}X_{t-1,n}^{-}+\varepsilon_{t},

\hat{θ}_{h} (u) := argmin_{θ \in Θ} L_{n, h} (u, θ) .

\hat{θ}_{h} (u) := argmin_{θ \in Θ} L_{n, h} (u, θ) .

L_{n,h}(u,\theta):=\frac{1}{n}\sum_{t=1}^{n}K_{h}\Big{(}\frac{t}{n}-u\Big{)}\ell_{t,n}(\theta)

L_{n,h}(u,\theta):=\frac{1}{n}\sum_{t=1}^{n}K_{h}\Big{(}\frac{t}{n}-u\Big{)}\ell_{t,n}(\theta)

ℓ (x, y, θ) = - lo g p_{θ} (X_{t, n} = x ∣ Y_{t - 1, n} = y),

ℓ (x, y, θ) = - lo g p_{θ} (X_{t, n} = x ∣ Y_{t - 1, n} = y),

I(\theta):=\mathbb{E}\big{[}\nabla\ell(\tilde{Y}_{0}(a),\theta)\cdot\nabla\ell(\tilde{Y}_{0}(a),\theta)^{\prime}\big{]}\big{|}_{a=\theta}.

I(\theta):=\mathbb{E}\big{[}\nabla\ell(\tilde{Y}_{0}(a),\theta)\cdot\nabla\ell(\tilde{Y}_{0}(a),\theta)^{\prime}\big{]}\big{|}_{a=\theta}.

d_{A}(\hat{\theta}_{h},\theta_{0}):=\frac{1}{n}\sum_{t=1}^{n}\Big{|}\hat{\theta}_{h}\big{(}\frac{t}{n}\big{)}-\theta_{0}\big{(}\frac{t}{n}\big{)}\Big{|}_{V(\theta_{0}(t/n))}^{2}w_{n,h}\big{(}\frac{t}{n}\big{)}

d_{A}(\hat{\theta}_{h},\theta_{0}):=\frac{1}{n}\sum_{t=1}^{n}\Big{|}\hat{\theta}_{h}\big{(}\frac{t}{n}\big{)}-\theta_{0}\big{(}\frac{t}{n}\big{)}\Big{|}_{V(\theta_{0}(t/n))}^{2}w_{n,h}\big{(}\frac{t}{n}\big{)}

d_{I}(\hat{\theta}_{h},\theta_{0}):=\int_{0}^{1}\big{|}\hat{\theta}_{h}(u)-\theta_{0}(u)\big{|}_{V(\theta_{0}(u))}^{2}w_{n,h}(u)\ \mbox{d}u.

d_{I}(\hat{\theta}_{h},\theta_{0}):=\int_{0}^{1}\big{|}\hat{\theta}_{h}(u)-\theta_{0}(u)\big{|}_{V(\theta_{0}(u))}^{2}w_{n,h}(u)\ \mbox{d}u.

L_{n,h,-s}(u,\theta):=\frac{1}{n}\sum_{t=1,t\not=s}^{n}K_{h}\Big{(}\frac{t}{n}-u\Big{)}\ell_{t,n}(\theta)

L_{n,h,-s}(u,\theta):=\frac{1}{n}\sum_{t=1,t\not=s}^{n}K_{h}\Big{(}\frac{t}{n}-u\Big{)}\ell_{t,n}(\theta)

\hat{θ}_{h, - s} (u) := argmin_{θ \in Θ} L_{n, h, - s} (u, θ) .

\hat{θ}_{h, - s} (u) := argmin_{θ \in Θ} L_{n, h, - s} (u, θ) .

CV(h):=\frac{1}{n}\sum_{s=1}^{n}\ell_{s,n}\big{(}\hat{\theta}_{h,-s}\big{(}\frac{s}{n}\big{)}\big{)}w_{n,h}\big{(}\frac{s}{n}\big{)}.

CV(h):=\frac{1}{n}\sum_{s=1}^{n}\ell_{s,n}\big{(}\hat{\theta}_{h,-s}\big{(}\frac{s}{n}\big{)}\big{)}w_{n,h}\big{(}\frac{s}{n}\big{)}.

C V (\hat{h}) - h \in H_{n} in f C V (h) \leq \frac{1}{n},

C V (\hat{h}) - h \in H_{n} in f C V (h) \leq \frac{1}{n},

n \to \infty lim \frac{d _{A} ( θ ^ _{\hat{h}} , θ _{0} )}{in f _{h \in H_{n}} d _{A} ( θ ^ _{h} , θ _{0} )} = 1,

n \to \infty lim \frac{d _{A} ( θ ^ _{\hat{h}} , θ _{0} )}{in f _{h \in H_{n}} d _{A} ( θ ^ _{h} , θ _{0} )} = 1,

δ_{q}^{W} (k) := ∥ W_{t} - W_{t}^{*} ∥_{q} .

δ_{q}^{W} (k) := ∥ W_{t} - W_{t}^{*} ∥_{q} .

z \neq = z^{'} sup \frac{∣ g ( z , θ ) - g ( z ^{'} , θ ) ∣}{∣ z - z ^{'} ∣ _{χ, 1} ( 1 + ∣ z ∣ _{χ, 1}^{M - 1} + ∣ z ^{'} ∣ _{χ, 1}^{M - 1} )} \leq C_{1}, θ \neq = θ^{'} sup \frac{∣ g ( z , θ ) - g ( z , θ ^{'} ) ∣}{∣ θ - θ ^{'} ∣ _{1} ( 1 + ∣ z ∣ _{χ, 1}^{M} )} \leq C_{2}

z \neq = z^{'} sup \frac{∣ g ( z , θ ) - g ( z ^{'} , θ ) ∣}{∣ z - z ^{'} ∣ _{χ, 1} ( 1 + ∣ z ∣ _{χ, 1}^{M - 1} + ∣ z ^{'} ∣ _{χ, 1}^{M - 1} )} \leq C_{1}, θ \neq = θ^{'} sup \frac{∣ g ( z , θ ) - g ( z , θ ^{'} ) ∣}{∣ θ - θ ^{'} ∣ _{1} ( 1 + ∣ z ∣ _{χ, 1}^{M} )} \leq C_{2}

\frac{1}{n} \sum_{t = 1}^{n} ∣ w_{n, h} (t / n) - w_{n, h^{'}} (t / n) ∣

\frac{1}{n} \sum_{t = 1}^{n} ∣ w_{n, h} (t / n) - w_{n, h^{'}} (t / n) ∣

\int_{0}^{1} ∣ w_{n, h} (u) - w_{n, h^{'}} (u) ∣ \mbox d u

n \to \infty lim \frac{d ( θ ^ _{\hat{h}} , θ _{0} )}{in f _{h \in H_{n}} d ( θ ^ _{h} , θ _{0} )} = 1,

n \to \infty lim \frac{d ( θ ^ _{\hat{h}} , θ _{0} )}{in f _{h \in H_{n}} d ( θ ^ _{h} , θ _{0} )} = 1,

\hat{θ}_{h} (u) - θ_{0} (u)

\hat{θ}_{h} (u) - θ_{0} (u)

d_{A}^{*} (\hat{θ}_{h}, θ_{0})

d_{A}^{*} (\hat{θ}_{h}, θ_{0})

d_{I}^{*} (\hat{θ}_{h}, θ_{0})

d_{M}^{*} (\hat{θ}_{h}, θ_{0}) := E d_{I}^{*} (\hat{θ}_{h}, θ_{0}) .

d_{M}^{*} (\hat{θ}_{h}, θ_{0}) := E d_{I}^{*} (\hat{θ}_{h}, θ_{0}) .

d_{M}^{*} (\hat{θ}_{h}, θ_{0}) = \frac{μ _{K} V _{0}}{nh} + \frac{h ^{4}}{4} d_{K}^{2} B_{0} + o ((nh)^{- 1}) + o (h^{4})

d_{M}^{*} (\hat{θ}_{h}, θ_{0}) = \frac{μ _{K} V _{0}}{nh} + \frac{h ^{4}}{4} d_{K}^{2} B_{0} + o ((nh)^{- 1}) + o (h^{4})

V_{0}

V_{0}

B_{0}

d_{M}^{**} (h) := \frac{μ _{K} V _{0}}{nh} + \frac{h ^{4}}{4} d_{K}^{2} B_{0}

d_{M}^{**} (h) := \frac{μ _{K} V _{0}}{nh} + \frac{h ^{4}}{4} d_{K}^{2} B_{0}

h \in H_{n} sup \frac{d ( θ ^ _{h} , θ _{0} ) - d _{M}^{**} ( h )}{d _{M}^{**} ( h )} \to 0 a . s .

h \in H_{n} sup \frac{d ( θ ^ _{h} , θ _{0} ) - d _{M}^{**} ( h )}{d _{M}^{**} ( h )} \to 0 a . s .

\frac{h ^}{h _{0}} \to 1 a . s .

\frac{h ^}{h _{0}} \to 1 a . s .

h_{0} = (\frac{V _{0} μ _{K}}{B _{0} d _{K}^{2}})^{1/5} n^{- 1/5} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Cross validation for locally stationary processes

Stefan Richter

Rainer Dahlhaus

Abstract

We propose an adaptive bandwidth selector via cross validation for local M-estimators in locally stationary processes. We prove asymptotic optimality of the procedure under mild conditions on the underlying parameter curves. The results are applicable to a wide range of locally stationary processes such linear and nonlinear processes. A simulation study shows that the method works fairly well also in misspecified situations.

1 Introduction

Inference for locally stationary time series models is strongly connected to the estimation of parameter curves which determine the degree of nonstationarity. The estimation of these curves was discussed for several specific models such as tvARMA processes (Dahlhaus and Polonik (2009)), the tvARCH and tvGARCH processes (Fryzclewicz, Sapatinas and Subba Rao (2008), Dahlhaus and Subba Rao (2006), Dahlhaus (2012)), and time-varying random coefficient models (Subba Rao (2006)). Of interest is also a time-varying TAR process which was considered in Zhou and Wu (2009)

Local estimators such as kernel estimators require the selection of a bandwidth. In opposite to nonparametric regression, there exist only very few theoretical results about adaptivity for locally stationary processes. We mention Mallat, Papanicolaou and Zhang (1998) who discussed adaptive covariance estimation for a general class of locally stationary processes. Other results are constructed for specific models and are partly dependent on further tuning parameters: Giraud, Roueff and Sanchez-Perez (2015) discussed online-adaptive forecasting of tvAR processes and Arkoun and Pergamenchtchikov (2008), Arkoun (2011) proposed methods for sequential and minimax-optimal bandwidth selection for tvAR processes of order 1.

In this paper we treat the problem for arbitrary locally stationary time series models determined by a time varying parameter curve. We focus on local M-estimators and use the functional dependence measure introduced in Wu (2005) to formulate mixing conditions. We propose an adaptive bandwidth selection procedure inspired by cross validation in the iid regression model which does not need any tuning parameters. We discuss the theoretic behavior by proving asymptotic optimality of the selector (similar to Härdle and Marron (1985) where nonparametric regression has been treated). We also prove convergence towards the deterministic asymptotic optimal bandwidth.

The technical core of the paper is martingaly theory applied in particular to the score function of the objective function and several bounds for moments of quadratic and cubic forms of locally stationary processes which are needed to provide convergence of expansions of the estimation error with suitable rates.

In Section 2 we introduce the locally stationary time series model and formalize the separation of the process into a parametric stationary process and unknown parameter curves. We define local M-estimators and the cross validation procedure. We introduce a Kullback-Leibler type distance measure which can be seen as an analogue to the averaged squared error in nonparametric regression.

In Section 3 we prove asymptotic optimality of the cross validation procedure with respect to the Kullback-Leibler type distance measure and convergence of the cross validation bandwidth towards the deterministic asymptotic optimal bandwidth. The assumptions are stated in terms of a parametric stationary time series model which is connected to the locally stationary process. This allows for easy verification since most of the conditions are standard in M-estimation theory and were already shown for specific stationary models.

In Section 4 we discuss some processes where the main results are applicable. The performance of the method for different models such as tvAR, tvARCH and tvMA is studied in simulations.

In Section 5 a short conclusion is drawn. All proofs are deferred without further reference to Section 6 and the appendix.

2 A cross validation method for locally stationary processes

2.1 The Model

In this paper we discuss adaptive estimation of a multidimensional parameter curve $\theta_{0}:[0,1]\to\Theta\subset\mathbb{R}^{p}$ , i.e. we restrict to locally stationary processes $X_{t,n}$ , $t=1,...,n$ parameterized by curves. As usual we are working in the infill asymptotic framework with rescaled time $t/n\in[0,1]$ , where $n$ denotes the number of observations.

Following the original idea of locally stationary processes, for fixed $u\in[0,1]$ , $X_{t,n}$ should locally (i.e., for $|u-\frac{t}{n}|\ll 1$ ) behave like a stationary process $\hat{X}_{t}(u)$ . In this paper, we assume that the time dependence of the approximation $\hat{X}_{t}(u)$ is solely described by $\theta_{0}$ , i.e. $\hat{X}_{t}(u)=\tilde{X}_{t}(\theta_{0}(u))$ , where $\tilde{X}_{t}(\theta)$ , $\theta\in\Theta$ is some family of parametric stationary processes. In this paper we will formulate the assumptions in terms of $\tilde{X}_{t}(\theta)$ instead of $\hat{X}_{t}(u)$ leading to a clear separation between the properties of the model class and the smoothness assumptions on $\theta_{0}$ . We formalize this by

Assumption 2.1 (Locally stationary time series model).

Let $q\geq 1$ and $\|W\|_{q}:=(\mathbb{E}|W|^{q})^{1/q}$ . Let $X_{t,n}$ , $t=1,...,n$ be a triangular array of observations. Suppose that for each $\theta\in\Theta$ , there exists a stationary process $\tilde{X}_{t}(\theta)$ , $t\in\mathbb{Z}$ such that for all $q\geq 1$ , uniformly in $\theta,\theta^{\prime}\in\Theta$ ,

[TABLE]

with some $C_{A},C_{B}\geq 0$ , and

[TABLE]

Remark 2.2.

(i)

We conjecture that the assumption on the existence of all moments of $X_{t,n}$ and $\tilde{X}_{t}(\theta)$ can be dropped - but the calculations would be very tedious without much additional insight. The number of moments needed for the proofs increases if the Hoelder exponent of the unknown parameter curve decreases. 2. (ii)

In many models, the second condition in (1) basically means that the unknown parameter curve $\theta_{0}$ has bounded variation, see also Assumption 3.3.

We first give some examples which are covered by our results. These include in particular several classical parametric time series models where the constant parameters have been replaced by time-dependent parameter curves.

Example 2.3.

(i)

the tvARMA( $r,s$ ) process: Given parameter curves $a_{i},b_{j},\sigma:[0,1]\to\mathbb{R}$ $(i=0,...,r$ , $j=0,...,s$ ) with $a_{0}(\cdot),b(\cdot)=1$ ,

[TABLE] 2. (ii)

the tvARCH( $r$ ) process (cf. Dahlhaus and Subba Rao (2006)**): Given parameter curves $a_{i}:[0,1]\to\mathbb{R}$ ( $i=0,...,r$ ),

[TABLE] 3. (iii)

the tvTAR( $1$ ) process (cf. Zhou and Wu (2009)**): Given parameter curves $a_{1},a_{2}:[0,1]\to\mathbb{R}$ , define

[TABLE]

where $x^{+}:=\max\{x,0\}$ and $x^{-}:=\max\{-x,0\}$ .

As an estimator of $\theta_{0}(\cdot)$ we consider local likelihood (or local M-) estimators weighted by kernels, that is

[TABLE]

where

[TABLE]

and $\ell_{t,n}(\theta):=\ell(X_{t,n},Y_{t-1,n}^{c},\theta)$ with $Y_{t-1,n}^{c}:=(X_{t-1,n},...,X_{1,n},0,0,...)$ consisting of the observed past, where $\ell$ is a given objective function (localized in $L_{n,h}(u,\theta)$ by the kernel $K$ ). $K:\mathbb{R}\to\mathbb{R}$ is nonnegative with $\int K=1$ , and $h\in(0,\infty)$ is the bandwidth. For shortening the notation, we used $K_{h}(\cdot):=\frac{1}{h}K\big{(}\frac{\cdot}{h}\big{)}$ . In practice, $\ell$ is often chosen to be the negative logarithm of the infinite past likelihood of $X_{t,n}$ given $Y_{t-1,n}:=(X_{s,n}:s\leq t-1)$ ,

[TABLE]

assuming that $\theta_{0}(\cdot)=\theta\in\Theta$ . In this paper, we allow for general objective functions $\ell$ which have to obey some smoothness conditions (see Assumption 3.3).

2.2 Distance measures

Define $\tilde{Y}_{t}(\theta):=(\tilde{X}_{s}(\theta):s\leq t)$ . In the following, we will use $\nabla$ to denote the derivative with respect to $\theta\in\Theta$ , and $x^{\prime}$ denotes the transpose of a vector or matrix $x$ . As global distance measures we use the averaged and the integrated squared error (ASE/ISE) weighted by the Fisher information

[TABLE]

and the misspecified Fisher information $V(\theta):=\mathbb{E}\nabla^{2}\ell(\tilde{Y}_{0}(a),\theta)\big{|}_{a=\theta}$ of the corresponding stationary approximation. In addition the weight function $w_{n,h}(\cdot):=\mathbbm{1}_{[\frac{h}{2},1-\frac{h}{2}]}(\cdot)$ is needed to exclude boundary effects. Since the proof is the same for other weights $w_{n,h}$ we allow in Assumption 3.4 for more general weights.

More precisely we set (with $|x|_{A}^{2}:=x^{\prime}Ax$ for $x\in\mathbb{R}^{p}$ and $A\in\mathbb{R}^{p\times p}$ )

[TABLE]

and

[TABLE]

It can be shown that $2d_{A}$ and $2d_{I}$ are for $w\!\equiv\!1$ an approximation of the Kullback-Leibler divergence between models with parameter curves $\hat{\theta}_{h}(\cdot)$ and $\theta_{0}(\cdot)$ .

In Theorem 3.8 we will prove that under suitable conditions, $d_{A}(\hat{\theta}_{h},\theta_{0})$ can be approximated uniformly in $h$ by a deterministic distance measure $d_{M}^{**}(h)$ , which has a unique minimizer $h_{0}=h_{0,n}\sim n^{-1/5}$ . $h_{0}$ can be seen as the (deterministic) optimal bandwidth.

2.3 The crossvalidation method

We now choose the bandwidth $h$ by a generalized cross validation method. We define a ’quasi-leave-one-out’ local likelihood

[TABLE]

and a ’quasi-leave-one-out’ estimator of $\theta_{0}$ by

[TABLE]

Here, ’leave-one-out’ does not mean that we ignore the $s$ -th observation of the process $(X_{t,n})_{t=1,...,n}$ , but that we ignore the term which is contributed by the likelihood $\ell_{t,n}$ at time step $s$ . Because of that, we refer to the estimator as a quasi-leave-one-out method.

We then choose $\hat{h}$ via minimizing the cross validation functional

[TABLE]

It is important to note that such a minimizer $\hat{h}$ of $CV(h)$ does not need to exist, because $CV(h)$ can not shown to be continuous. When $h$ varies it is possible that the location of the minimum of $L_{n,h,-s}(u,\theta)$ changes and therefore $\hat{\theta}_{h,-s}(u)$ makes a jump. For the mathematical considerations we therefore choose some $\hat{h}$ such that

[TABLE]

where $H_{n}$ is a suitable subinterval of $(0,1)$ , see Assumption 3.4, which covers all relevant values of $h$ .

3 Main results

In this chapter we present our main results concerning the bandwidth $\hat{h}$ chosen by cross validation. We prove in Theorem 3.6 that $\hat{h}$ is asymptotically optimal with respect to $d_{A}$ , i.e.

[TABLE]

and in Theorem 3.9 that $\hat{h}$ is consistent in the sense that ${\hat{h}}/{h_{0}}\to 1$ a.s., where $h_{0}$ is the deterministic optimal bandwidth defined in (21). Recall that $d_{A}(\hat{\theta}_{h},\theta_{0})$ can be interpreted as a Kullback-Leibler-type distance between the two time series models associated to $\hat{\theta}_{h}$ and $\theta_{0}$ . Thus, the cross validation procedure yields an estimator $\hat{\theta}_{\hat{h}}$ of $\theta_{0}$ such that the distributions of the associated time series coincide best.

To prove asymptotic results, we have to state some mixing type conditions on the underlying process $X_{t,n}$ . For this, we use the functional dependence measure introduced in Wu (2005). Let $\varepsilon_{t}$ , $t\in\mathbb{Z}$ be a sequence of i.i.d. random variables. For $t\geq 0\;$ let $\mathcal{F}_{t}:=(\varepsilon_{t},\varepsilon_{t-1},...)$ be the shift process and $\mathcal{F}_{t}^{*}:=(\varepsilon_{t},...,\varepsilon_{1},\varepsilon_{0}^{*},\varepsilon_{-1},...)$ , where $\varepsilon_{0}^{*}$ is a random variable which has the same distribution as $\varepsilon_{0}$ and is independent of all $\varepsilon_{t}$ , $t\in\mathbb{Z}$ . For a stationary process $W_{t}=H(\mathcal{F}_{t})\in L^{q}$ with deterministic $H:\mathbb{R}^{\infty}\to\mathbb{R}$ define $W_{t}^{*}:=H_{t}(\mathcal{F}_{t}^{*})$ and the functional dependence measure

[TABLE]

Assumption 3.1 (Dependence assumption).

*Suppose that for each $\theta\in\Theta$ , there exists a representation $\tilde{X}_{t}(\theta)=H(\theta,\mathcal{F}_{t})$ with some measurable $H(\theta,\cdot)$ and $\delta_{q}(k):=\sup_{\theta\in\Theta}\delta_{q}^{\tilde{X}(\theta)}(k)=O(k^{-(3+\eta)})$ for some $\eta>0$ . *

Note that we only need dependence conditions on the stationary approximations $\tilde{X}_{t}(\theta)$ and no further assumption on $X_{t,n}$ .

To state smoothness conditions on the objective function $\ell$ in a concise way, we introduce the class of Lipschitz-continuous functions from $\mathbb{R}^{\infty}$ to $\mathbb{R}$ where we allow the Lipschitz constant to depend on the location at most polynomially.

Definition 3.2 (The class $\mathcal{L}(M,\chi,C)$ ).

We say that a function $g:\mathbb{R}^{\infty}\times\Theta\to\mathbb{R}$ is in the class $\mathcal{L}(M,\chi,C)$ if $C=(C_{1},C_{2})$ , $M\geq 1$ , $\chi=(\chi_{i})_{i=1,2,3,...}\in\mathbb{R}_{\geq 0}^{\infty}$ and for all $z\in\mathbb{R}^{\infty}$ , $\theta\in\Theta$ :

[TABLE]

where $|z|_{\chi,1}:=\sum_{i=1}^{\infty}\chi_{i}\cdot|z_{i}|$ and $\sum_{i=1}^{\infty}\chi_{i}<\infty$ .

In Assumption 3.3, we pose some standard conditions on the likelihood function $\ell$ which ensure the validity of basic results (such as Taylor expansions) from maximum likelihood theory. Again all conditions are formulated in terms of the stationary process $\tilde{X}_{t}(\theta)$ and therefore easily verifiable due to known results on stationary time series.

Assumption 3.3.

Suppose that $\ell$ is three times differentiable with respect to $\theta$ , and

(1)

$\Theta\subset\mathbb{R}^{d}$ * is compact. For all $u\in[0,1]$ , $\theta_{0}(u)$ lies in the interior of $\Theta$ and $\theta_{0}$ is Hoelder continuous with exponent $\beta>0$ and has component-wise bounded variation $B_{\theta_{0}}$ .* 2. (2)

$\theta_{0}(u)$ * is the unique minimizer of $L(u,\theta):=\mathbb{E}\ell(\tilde{Y}_{0}(\theta_{0}(u)),\theta)$ .* 3. (3)

the minimal eigenvalue of $V(\theta):=\mathbb{E}[\nabla^{2}\ell(\tilde{Y}_{t}(\theta),\theta)]$ is bounded from below by some constant $\lambda_{0}$ uniformly in $\theta\in\Theta$ . 4. (4)

$\nabla\ell(\tilde{Y}_{t}(\theta^{\prime}),\theta)\big{|}_{\theta^{\prime}=\theta}$ * is a martingale difference sequence with respect to $\mathcal{F}_{t}$ in each component.* 5. (5)

*each component of $g\in\{\ell,\nabla\ell,\nabla^{2},\nabla^{3}\ell\}$ lies in $\mathcal{L}(M,\chi,C)$ for some $\chi=(\chi_{j})_{j=1,2,...}$ , where $\chi_{j}=O(j^{-(3+\eta)})$ for some $\eta>0$ . *

Finally, let us formalize the conditions on the set of bandwidths $H_{n}$ , the localizing kernel $K$ appearing in the estimation procedure and the weight function $w_{n,h}$ which arises in the cross validation functional and the distance measures.

Assumption 3.4.

For $n\in\mathbb{N}$ let $H_{n}=[\underline{h},\overline{h}]$ , where $\underline{h}\geq c_{0}n^{\delta-1}$ , $\overline{h}\leq c_{1}n^{-\delta}$ for some constants $c_{0},c_{1},\delta>0$ . Suppose that

(1)

the kernel $K:\mathbb{R}\to\mathbb{R}$ has compact support $\subset[-\frac{1}{2},\frac{1}{2}]$ , fulfills $\int K(x)\ \mbox{d}x=1$ and is Lipschitz continuous with Lipschitz constant $L_{K}$ . 2. (2)

the weight function $w_{n,h}:[0,1]\to\mathbb{R}_{\geq 0}$ is bounded by $|w|_{\infty}$ , has bounded variation $B_{w}$ uniformly in $n,h$ and support $\subset[\frac{h}{2},1-\frac{h}{2}]$ .

For some $w:[0,1]\to\mathbb{R}_{\geq 0}$ with support of Lebesgue measure greater than zero, assume that $\sup_{h\in H_{n}}\int|w_{n,h}(u)-w(u)|\ \mbox{d}u\to 0$ .

Furthermore, suppose that there exists some $C_{w}>0$ such that

[TABLE]

Remark 3.5.

Note that all conditions in (2) are fulfilled by the indicator $w_{n,h}(\cdot)=\mathbbm{1}_{[\frac{h}{2},1-\frac{h}{2}]}(\cdot)$ or $w_{n,h}(\cdot)=\mathbbm{1}_{[\nu,1-\nu]}(\cdot)$ with some fixed $\nu>0$ .

We now show that the cross validation bandwidth $\hat{h}$ is asymptotically optimal.

Theorem 3.6 (Asymptotic optimality of cross validation).

Under assumptions 2.1, 3.1, 3.3 and 3.4 the bandwidth $\hat{h}$ chosen by cross validation is asymptotically optimal in the sense that

[TABLE]

where $d$ is $d_{A}$ or $d_{I}$ .

Under stronger smoothness assumptions which allow a typical bias expansion up to the second derivative, we will show (in Theorem 3.9 below) that $\hat{h}$ is asymptotically equivalent to the asymptotically optimal theoretical bandwidth $h_{0}$ (ao-bandwidth for short). The additional smoothness assumptions are natural specifications of Assumption 2.1 and 3.3.

Assumption 3.7 (Bias expansion conditions).

Suppose that

(1)

$K$ * is symmetric and $\theta_{0}$ is twice continuously differentiable,* 2. (2)

for all $\theta\in\Theta$ , $z\in\mathbb{R}^{\infty}$ , $z\mapsto\nabla\ell(z,\theta)$ is twice partially differentiable and $\partial_{z_{i}}\partial_{z_{j}}\nabla\ell(\cdot,\theta)\in\mathcal{L}(\max\{M-2,1\},\tilde{\chi},\tilde{\psi}_{1}(i)\tilde{\psi}_{2}(j))$ for all $i,j\geq 1$ with absolutely summable sequences $\tilde{\psi}_{1},\tilde{\psi}_{2}$ . 3. (3)

$\theta\mapsto\tilde{X}_{t}(\theta)$ * is twice continuously differentiable almost surely. For all $i,j=1,...,d$ , $\|\sup_{\theta\in\Theta}|\nabla_{i}\tilde{X}_{0}(\theta)|\|_{M}$ and $\|\sup_{\theta\in\Theta}|\nabla^{2}_{ij}\tilde{X}_{0}(\theta)|\|_{M}$ are finite.*

We know from standard asymptotics that

[TABLE]

which motivates the following approximations to $d_{A}(\hat{\theta}_{h},\theta_{0})$ and $d_{I}(\hat{\theta}_{h},\theta_{0})$ :

[TABLE]

We now set

[TABLE]

If $\theta_{0}$ is twice continuously differentiable and some additional smoothness assumptions on the approximating stationary process (see Assumption 3.7), Proposition 6.4 together with Assumption 3.4 (2) implies the usual bias-variance decomposition for $d_{M}^{*}$ :

[TABLE]

uniformly in $h\in H_{n}$ , where $\mu_{K}:=\int K(x)^{2}\ \mbox{d}x$ , $d_{K}:=\int K(x)x^{2}\ \mbox{d}x$ and

[TABLE]

leading to the definition of the deterministic bias-variance decomposition $d_{M}^{**}(h)$ and the resulting asymptotically optimal bandwidth in the following two theorems.

Theorem 3.8 (Approximation of distance measures).

Let Assumptions 2.1, 3.1, 3.3, 3.4 and 3.7 hold. Define

[TABLE]

If the bias $B_{0}$ is not degenerated, i.e. $B_{0}>0$ , then it holds that

[TABLE]

where $d$ is $d_{A}$ or $d_{I}$ .

Theorem 3.9 (Consistency of the cross validation bandwidth).

Let Assumptions 2.1, 3.1, 3.3, 3.4 and 3.7 hold. Then the bandwidth $\hat{h}$ chosen by cross validation fulfils

[TABLE]

where

[TABLE]

is the unique minimizer of $d_{M}^{**}(h)$ .

3.1 Proofs

Here we present the structure of the proofs of Theorems 3.6, 3.9 and 3.8. The technical details including the proofs of the lemmata are postponed to the appendix. The main tool for the proofs is a general bound for moments on quadratic and cubic forms of functions of locally stationary processes (cf. Proposition C.1). From now on, we assume that Assumptions 2.1, 3.1, 3.3 and 3.4 hold. The following Lemma shows that the approximated distances $d_{I}^{*}$ , $d_{A}^{*}$ are close to $d_{M}^{*}$ .

Lemma 3.10.

We have almost surely

[TABLE]

As a consequence of Lemma 3.10 also the distances $d_{I},d_{A}$ are close to $d_{M}^{*}$ :

Corollary 3.11.

We have almost surely

[TABLE]

To get a connection between the distance measure $d_{M}^{*}$ and the cross validation functional $CV(h)$ , we define

[TABLE]

The next two lemmata show that $\overline{d}_{A}$ is close both to $d_{M}^{*}$ and $CV(h)$ . Lemma 3.13 can be viewed as the core of the proof since there the main assumptions come into play, such as the martingale property of $\nabla\ell(\tilde{Y}_{t}(\theta^{\prime}),\theta)\big{|}_{\theta^{\prime}=\theta}$ which is used for normalization and the differentiability properties of $\ell$ which are used for Taylor expansions of third order.

Lemma 3.12.

We have almost surely

[TABLE]

Lemma 3.13.

We have almost surely

[TABLE]

With the help of these results, we can now prove Theorems 3.6, 3.8, 3.9:

Proof of Theorem 3.6.

An immediate consequence of Lemma 3.13 is (use $\frac{x_{1}+x_{2}}{y_{1}+y_{2}}\leq\frac{x_{1}}{y_{1}}+\frac{x_{2}}{y_{2}}$ for positive numbers $x_{1},x_{2},y_{1},y_{2}>0$ )

[TABLE]

almost surely. Now, using Corollary 3.11 and Lemma 3.12 it is easy to see that

[TABLE]

Choosing $h=\hat{h}$ and $h^{\prime}$ such that

[TABLE]

yields

[TABLE]

almost surely. Because of Corollary 3.11 and (17) we have $\sup_{h\in H_{n}}\frac{n^{-1}}{d_{A}(\theta_{h},\theta_{0})}\to 0$ a.s. Thus,

[TABLE]

from which

[TABLE]

follows. The same can be done for $d_{I}$ . ∎

Proof of Theorem 3.8.

Because of $B_{0}>0$ and (17), we have

[TABLE]

Application of Corollary 3.11 finishes the proof. ∎

Proof of Theorem 3.9.

As in the proof of Theorem 3.8, we show (23). This result in combination with Lemma 3.12 and Lemma 3.13 gives almost surely

[TABLE]

Using the same methods as in the proof of Theorem 3.6, we have almost surely

[TABLE]

The structure of $d_{M}^{**}(h)$ implies $\hat{h}/h_{0}\to 1$ a.s. ∎

4 Examples and Simulations

4.1 Examples

Assumptions 2.1, 3.1, 3.3 and 3.7 are fulfilled for a large class of locally stationary time series models. Here, we discuss how the conditions transform in the case of some special linear and recursively defined time series. More general statements can be found in the technical supplement, see Proposition D.1 and D.2 therein.

Recall that $\varepsilon_{t}$ , $t\in\mathbb{Z}$ is a sequence of i.i.d. real random variables. We will use a Gaussian likelihood for $\ell$ defined in (4), but allow for a non Gaussian distribution of $\varepsilon_{t}$ .

An important special case of locally stationary linear processes is given by tvARMA processes, see also Proposition 2.4. in Dahlhaus and Polonik (2009). Since in this case, the linear filter $A_{\theta}(\lambda)=\sigma\cdot\frac{\beta(e^{i\lambda})}{\alpha(e^{i\lambda})}$ and the spectral density $f_{\theta}(\lambda)=\frac{\sigma^{2}}{2\pi}\cdot\big{|}\frac{\beta(e^{i\lambda})}{\alpha(e^{i\lambda})}\big{|}^{2}$ have a simple form, the conditions in Proposition D.1 are obviously fulfilled. The likelihood (4) takes the form

[TABLE]

Example 4.1 (tvARMA( $r,s$ ) process).

Assume that $\varepsilon_{t},t\in\mathbb{Z}$ are i.i.d. with existing moments of all order. Suppose that $\mathbb{E}\varepsilon_{0}=0$ and $\mathbb{E}\varepsilon_{0}^{2}=1$ . Let Assumption 3.3 (1) hold. Assume that $X_{t,n}$ obeys

[TABLE]

where $\theta_{0}=(\alpha_{1},...,\alpha_{r},\beta_{1},...,\beta_{s},\sigma)^{\prime}$ . Define $\beta(z):=1+\sum_{k=0}^{s}\beta_{k}z^{k}$ , $\alpha(z):=1+\sum_{k=0}^{r}\alpha_{k}z^{k}$ , and let $\Theta$ be an arbitrary compact subset of

[TABLE]

Then Assumptions 2.1, 3.1, 3.3 are fulfilled for $\ell$ chosen as in (24). If additionally Assumption 3.7 (1) is fulfilled, then Assumption 3.7 is fulfilled. It holds that $V(\theta)=\frac{1}{4\pi}\int\nabla\log f_{\theta}(\lambda)\cdot\nabla\log f_{\theta}(\lambda)^{\prime}\ \mbox{d}\lambda$ .

Remark 4.2 (tvAR( $r$ ) processes).

In the special case of $tvAR(r)$ processes, closed forms for the estimators based on $\ell(z,\theta)=\frac{1}{2}\log(2\pi\sigma^{2})+\frac{1}{2\sigma^{2}}\big{(}z_{1}+\sum_{j=1}^{r}\alpha_{j}z_{j+1}\big{)}^{2}$ are available: $\hat{\alpha}_{h}(u)=-\hat{\Gamma}_{h}(u)^{-1}\hat{\gamma}_{h}(u)$ and $\hat{\sigma}_{h}(u)^{2}=\frac{1}{n}\sum_{t=r+1}^{n}\big{(}X_{t,n}+\sum_{j=1}^{r}\hat{\alpha}_{j}(u)X_{t-j,n}\big{)}^{2}$ , where $Z_{t-1,n}=(X_{t-1,n},...,X_{t-r,n})^{\prime}$ and

[TABLE]

We now discuss recursively defined nonlinear time series models with additive innovations $\varepsilon_{t}$ . Let us fix some $r>0$ and define the vectors of the last $r$ lags $Z_{t-1,n}=(X_{t-1,n},...,X_{t-r,n})^{\prime}$ , $\tilde{Z}_{t-1}(\theta)=(\tilde{X}_{t-1}(\theta),...,\tilde{X}_{t-r}(\theta))^{\prime}$ as the vector of the $r$ past values of the locally stationary and the stationary time series, respectively. Many popular locally stationary models assume that the conditional mean and / or variance is a linear combination of unknown parameter curves and functions of $Z_{t-1,n}$ , i.e.

[TABLE]

with some measurable $\tilde{\mu}$ , $\tilde{\sigma}$ . In this case, the likelihood (4) takes the form

[TABLE]

In the first example we discuss conditional mean processes. This class covers the tvAR- as well as the tvTAR case.

Example 4.3 (Conditional mean processes).

Assume that $\varepsilon_{t}$ , $t\in\mathbb{Z}$ are i.i.d. and have all moments with $\mathbb{E}\varepsilon_{0}=0$ and $\mathbb{E}\varepsilon_{0}^{2}=1$ . Suppose that Assumption 3.3 (1) is fulfilled. Assume that $X_{t,n}$ obeys

[TABLE]

Here, $\theta_{0}=(\alpha_{1},...,\alpha_{p-1},\sigma)^{\prime}:[0,1]\to\Theta$ and $\mu=(\mu_{1},...,\mu_{p-1}):\mathbb{R}^{r}\to\mathbb{R}^{p-1}$ is a function which fulfills

(a)

$\sup_{y\not=y^{\prime}}\frac{|\mu_{i}(y)-\mu_{i}(y^{\prime})|}{|y-y^{\prime}|_{\chi_{i},1}}\leq 1$ * with some $\chi_{i}\in\mathbb{R}_{\geq 0}^{r}$ ( $i=1,...,p-1$ ),* 2. (b)

$\mu_{1}(\tilde{Y}_{0}(\theta)),...,\mu_{p-1}(\tilde{Y}_{0}(\theta))$ * are linearly independent in $L^{2}$ for all $\theta\in\Theta$ .*

Define $\Theta:=\{\theta=(\alpha_{1},...,\alpha_{d},\sigma)\in\mathbb{R}^{p}:\sum_{i=1}^{p-1}\sum_{j=1}^{r}|\alpha_{i}|\chi_{i,j}\leq\rho,\sigma_{min}\leq\sigma\leq\sigma_{max}\}$ with some $0<\rho<1$ and $0<\sigma_{min}<\sigma_{max}$ .

Then Assumptions 2.1, 3.1, 3.3 are fulfilled for $\ell$ chosen as in (25). Furthermore, with $W(\theta):=\mathbb{E}[\mu(\tilde{Z}_{0}(\theta))\mu(\tilde{Z}_{0}(\theta))^{\prime}]$ it holds that

[TABLE]

The next example discusses conditional variance processes, who cover, for instance, the tvARCH process.

Example 4.4 (Conditional variance processes).

Assume that $\varepsilon_{t}$ , $t\in\mathbb{Z}$ are i.i.d. with $\mathbb{E}\varepsilon_{0}=0$ and $\mathbb{E}\varepsilon_{0}^{2}=1$ and are almost surely bounded by some $C_{\varepsilon}>0$ . Suppose that Assumption 3.3 (1) is fulfilled. Assume that $X_{t,n}$ obeys

[TABLE]

Here, $\theta_{0}=(\alpha_{1},...,\alpha_{p})^{\prime}:[0,1]\to\Theta$ and $\mu=(\mu_{1},...,\mu_{p-1}):\mathbb{R}^{r}\to\mathbb{R}^{p}_{\geq 0}$ is a function which fulfills

(a)

$\sup_{y\not=y^{\prime}}\frac{|\sqrt{\mu_{i}(y)}-\sqrt{\mu_{i}(y^{\prime})}|}{|y-y^{\prime}|_{\chi_{i},1}}\leq 1$ * with some $\chi_{i}\in\mathbb{R}_{\geq 0}^{r}$ ( $i=1,...,p$ ). There exists $\mu_{0}>0$ such that $\mu_{1}(y)\geq\mu_{0}$ for all $y\in\mathbb{R}^{r}$ .* 2. (b)

$\mu_{1}(\tilde{Y}_{0}(\theta)),...,\mu_{p}(\tilde{Y}_{0}(\theta))$ * are linearly independent in $L^{2}$ for all $\theta\in\Theta$ .*

Define $\Theta:=\{\theta=(\alpha_{1},...,\alpha_{p})\in\mathbb{R}^{p}:\sum_{i=1}^{p}\sum_{j=1}^{r}\sqrt{\alpha_{i}}\chi_{i,j}\leq\rho_{max}C_{\varepsilon}^{-1},\theta_{i}\geq\rho_{min}\}$ with some $0<\rho_{max}<1$ and $\rho_{min}>0$ .

Then Assumptions 2.1, 3.1, 3.3 are fulfilled for $\ell$ chosen as in (25). It holds that

[TABLE]

**A simulation study. ** Here, we study the behavior of the presented cross validation algorithm for different time series models. We assume that $\varepsilon_{t}$ is standard Gaussian distributed, and consider

(a)

tvAR(1) processes $X_{t,n}=\alpha(\frac{t}{n})X_{t-1,n}+\sigma(\frac{t}{n})\varepsilon_{t}$ , with $\alpha(u)=0.9\sin(2\pi u)$ and $\sigma(u)=0.3\sin(2\pi u)+0.5$ . 2. (b)

tvMA(1) processes $X_{t,n}=\sigma(\frac{t}{n})\varepsilon_{t}+\alpha(\frac{t}{n})\sigma(\frac{t-1}{n})\varepsilon_{t-1}$ , with $\alpha(u)=0.9\sin(2\pi u)$ and $\sigma(u)=0.3\sin(2\pi u)+0.5$ . 3. (c)

tvARCH(1) processes $X_{t,n}=\sqrt{\alpha_{1}(\frac{t}{n})+\alpha_{2}(\frac{t}{n})X_{t-1,n}^{2}}\cdot\varepsilon_{t-1}$ , with $\alpha_{1}(u)=0.2\sin(2\pi u)+0.4$ and $\alpha_{2}(u)=0.1\sin(2\pi u)+0.2$ . 4. (d)

tvTAR(1) processes $X_{t,n}=\alpha_{1}(\frac{t}{n})X_{t-1,n}^{+}+\alpha_{2}(\frac{t}{n})X_{t-1,n}^{-}+\varepsilon_{t}$ , with $\alpha_{1}(u)=0.4\sin(2\pi u)$ and $\alpha_{2}(u)=0.5\cos(2\pi u)$ and $y^{+}:=\max\{y,0\}$ , $y^{-}:=\max\{-y,0\}$ for real numbers $y$ .

We performed a Monte Carlo study by generating in each case $N=2000$ realizations of time series with length $n\in\{200,500\}$ . For estimation, we used the weight function $w_{n,h}(\cdot)=\mathbbm{1}_{[0.05,0.95]}(\cdot)$ which excludes most of the boundary effects and the Epanechnikov kernel $K(x)=\frac{3}{2}(1-(2x)^{2})\mathbbm{1}_{[-\frac{1}{2},\frac{1}{2}]}(x)$ . We do not use $w_{n,h}(\cdot)=\mathbbm{1}_{[\frac{h}{2},1-\frac{h}{2}]}(\cdot)$ since this weight function has poor finite sample properties for large $h$ .

We chose $H_{n}=[0.01,1.0]$ and calculated the cross-validation bandwidth $\hat{h}$ , the ao-bandwidth $h_{0}$ from Theorem 3.9 (for models (a)-(c), model (d) does not satisfy the smoothness conditions) and the optimal theoretical bandwidth

[TABLE]

Note that $\hat{h},h^{*}$ depend on the current realization while $h_{0}$ is deterministic and fixed. $h^{*}$ and $h_{0}$ depend on the unknown true curve $\theta_{0}(\cdot)$ and are unavailable in practice.

Figure 1 shows the results $\hat{h},h^{*}$ for the four models respectively. The histograms show the chosen cross validation bandwidths $\hat{h}$ , the bandwidth $h_{0}$ is marked via a black vertical line. The boxplots show the achieved values of $d_{A}(\hat{\theta}_{h},\theta_{0})$ for the different selectors $h\in\{\hat{h},h_{0},h^{*}\}$ (labeled as ’CV’, ’Plugin’ and ’Optimal’). Each box contains $50\%$ while the whiskers contain $90\%$ of the values of $d_{A}(\hat{\theta}_{h},\theta_{0})$ . It can be seen that the cross validation procedure works well even for the case of a time series length of only $n=200$ if the model is recursively defined (i.e., (a), (b), (d)) while the method needs a larger sample size such that the bandwidths accumulate around $h_{0}$ in the tvMA case (c). For the models (a),(d) we observe that the distances $d_{A}$ attained by the cross validation approach are nearly as good as the distances obtained by the optimal selector $h^{*}$ which is remarkable. For the models (b) and (c) the values of $d_{A}$ associated to $\hat{h}$ have a higher variance. This can be explained by the higher variance of the maximum likelihood estimators $\hat{\theta}_{h}$ in these models. In all cases, the distances produced by the estimator based on the cross validation procedure are of course greater in average, but they still look quite satisfying in our opinion.

**Model misspecifications: ** We observed in simulations that the performance of the cross validation procedure is robust against the distribution of $\varepsilon_{t}$ , leading to similar results even if $\varepsilon_{t}$ is uniformly, exponentially or Pareto distributed (meaning that the moment conditions from Assumption 2.1 are violated).

Due to the fact that our cross validation method is a natural generalization of the version for iid regression it works even well if the underlying model itself is misspecified. In the following we estimate parameters with a Gaussian likelihood which assumes that the time series model follows a tvAR(1) model $X_{t,n}=\alpha^{ms}(t/n)X_{t-1,n}+\sigma^{ms}(t/n)\varepsilon_{t}$ , but in fact the underlying model is either tvMA (b) or tvARCH (c). The cross validation method then tries to estimate the minimizer $\theta_{0}^{ms}(u)=(\alpha^{ms}(u),\sigma^{ms}(u))^{\prime}$ of $\theta\mapsto L(u,\theta)$ , i.e. $\alpha^{ms}(u)=\frac{c(1,u)}{c(0,u)}$ and $\sigma^{ms}(u)=\big{(}\frac{c(0)^{2}-c(1)^{2}}{c(0)}\big{)}^{1/2}$ with the covariances $c(k,u):=\mathbb{E}[\tilde{X}_{0}(\theta_{0}(u))\tilde{X}_{k}(\theta_{0}(u))]$ :

[TABLE]

To compare the distances, we use $d_{A}(\hat{\theta}_{h}(u),\theta_{0}^{ms}(u))$ with $V$ from the tvAR(1) model. The simulations are performed in the same way as for the correctly specified case above. In Figure 2 it is seen that even in the misspecified case the bandwidth selector $\hat{h}$ produces reasonable estimators which are comparable with the optimal bandwidth choice $h^{*}$ in the case of tvMA estimators and still satisfying in the tvARCH case (note that a lot of information is lost due to the fact that $\alpha^{ms}(u)\equiv 0$ in this case).

5 Concluding remarks

In this paper we have introduced a data adaptive bandwidth selector via cross validation which is applicable for a large class of locally stationary processes. An important property of the method is the fact that it does not involve any tuning parameters.

In simulations we have seen that the proposed cross validation method yields nearly optimal bandwidth choices with respect to an Kullback-Leibler type distance measure in the case of correctly specified models and still leads to satisfying results in the case of model misspecification. It remains an open question if a similar cross validation procedure can be defined which is asymptotically optimal with respect to a simple quadratic distance measure (i.e. without a weighting matrix) which would then lead to estimates of $\theta_{0}$ which do not optimize the prediction properties of the associated model but the estimation quality of the parameter curve $\theta_{0}$ itself.

We mention that it is not hard to generalize the proposed method and the proofs to multidimensional time series which may be of interest in many practical applications.

An interesting open problem is the adaptive estimation in time series models with several parameter curves coming from different smoothness class, in particular since these curves are not observed separately but via a single time series.

Let us point out the fact that cross validation procedures in general are not stable if applied locally. Thus it remains an open question to find an local adaptive bandwidth selector.

In nonparametric regression there exist also several results on the rate of convergence of $\hat{h}-h_{0}$ . Based on the simulations, we conjecture that $n^{1/10}\cdot(\hat{h}-h_{0})$ (with $h_{0}$ from (21)) is asymptotically normal if $\theta_{0}$ is twice continuously differentiable, like Härdle, Hall and Marron (1988) showed in the iid regression case. This raises the question if there are improved crossvalidation methods like Chiu (1991) (via Fourier transform) or Hall, Marron and Park (1992) (via presmoothing) proved in the iid kernel density estimation case that attain the optimal rate of $n^{1/2}$ if further smoothness assumptions on $\theta_{0}$ are supposed.

6 Proofs

In the following, we will use the abbreviation $\mathbb{E}_{0}Z:=Z-\mathbb{E}Z$ for real-valued random variables (or random vectors) $Z$ . Note that by Assumption 3.3, we have that the minimal eigenvalue of $V(\theta)$ is bounded from below by some $\lambda_{0}$ , leading to invertibility of $V(\theta)$ and by equivalence of the norms on $\mathbb{R}^{p\times p}$ to a uniform upper bound

[TABLE]

Additionally define $V_{1,max}:=\sup_{\theta\in\Theta}\sup_{i,j=1,...,p}|V(\theta)_{ij}|$ . In the following, we will use the abbreviation $\tilde{X}_{t}(u):=\tilde{X}_{t}(\theta_{0}(u))$ and $\tilde{Y}_{t}(u)=(\tilde{X}_{s}(u):s\leq t)$ . Recall that $Y_{t,n}^{c}=(X_{t,n},...,X_{1,n},0,0,...)$ . Define

[TABLE]

and by omitting the $t$ -th summand, $\hat{L}_{n,h,-t}(u,\theta)$ .

6.1 Standard approximations

Lemma 6.1 (The stationary crop approximation).

Let $g\in\mathcal{L}(M,\chi,C)$ . Put

[TABLE]

Suppose that Assumption 2.1 holds. Assume that $\chi_{j}=O(j^{-(2+\delta)})$ for some $\delta>0$ . Then for all $q\geq 1$ there exists a constant $C_{S,q}>0$ not depending on $n,h$ such that

[TABLE]

The proof follows from Hoelder’s inequality. Details can be found in the appendix.

Lemma 6.2 (Weak bias approximation).

Let $g\in\mathcal{L}(M,\chi,C)$ . Define

[TABLE]

Suppose that Assumption 2.1 holds. Assume that $\theta_{0}$ is Hoelder continuous with exponent $\beta$ in each component. Then there exist constants $C_{bias}>0$ such that for all $u\in[0,1]$ :

[TABLE]

Proof of Lemma 6.2:.

Since $\theta_{0}$ is Hoelder continuous in each component, there exists $\tilde{L}>0$ such that $|\theta_{0,i}(u)-\theta_{0,i}(v)|\leq\tilde{L}|u-v|^{\beta}$ for all $i=1,...,p$ and $u,v\in[0,1]$ . Thus $\|\tilde{X}_{t}(t/n)-\tilde{X}_{t}(u)\|_{2M}\leq C_{A}|\theta_{0}(t/n)-\theta_{0}(u)|_{1}\leq C_{A}d\cdot\tilde{L}|t/n-u|^{\beta\wedge 1}$ . By Hoelder’s inequality,

[TABLE]

Since $\frac{1}{n}\sum_{t=1}^{n}\big{|}K_{h}\big{(}\frac{t}{n}-u\big{)}\big{|}\cdot\|\tilde{X}_{0}(t/n)-\tilde{X}_{0}(u)\|_{2M}\leq|K|_{\infty}C_{A}pL\cdot h^{\beta\wedge 1}$ , we obtain the result. ∎

6.2 The bias-variance decomposition of $\nabla L_{n,h}(u,\theta_{0}(u))$

The proof of the next Lemma 6.3 is purely analytical and is deferred to the appendix.

Lemma 6.3 (Expansion of expectations).

*Let Assumption 2.1, 3.3 and 3.7 hold. Assume that $g:\mathbb{R}^{\infty}\to\mathbb{R}$ is twice continuously partially differentiable and $\partial_{i}\partial_{j}g\in\mathcal{L}_{\infty}(M-2,\tilde{\chi},\tilde{\psi}_{1}(i)\tilde{\psi}_{2}(j))$ for each component of $\partial^{2}g$ , where $\tilde{\psi}_{1},\tilde{\psi}_{2}$ are absolutely summable sequences.

Furthermore assume that $|\partial_{i}g(0)|\leq\tilde{\psi}_{1}(i)$ and $|\partial_{i}\partial_{j}g(0)|\leq\tilde{\psi}_{1}(i)\tilde{\psi}_{2}(j)$ for all $i,j\geq 1$ . Then it holds for $u\in[0,1]$ and $\xi\in\mathbb{R}$ that*

[TABLE]

where $R(u,\xi):=\int_{0}^{1}\big{\{}\mathbb{E}\partial_{u}^{2}g(\tilde{Y}_{0}(u+\xi s))-\mathbb{E}\partial_{u}^{2}g(\tilde{Y}_{0}(u))\big{\}}\ \mbox{d}s=o(1)$ ( $\xi\to 0$ ) uniformly in $u\in[0,1]$ , and all expressions exist.

We now summarize the results about the bias-variance decomposition of $\nabla L_{n,h}(u,\theta_{0}(u))$ . The following Proposition is obtained as a corollary from Lemma 6.1, Lemma 6.2 and Lemma 6.3. Details of the proof are deferred to appendix.

Proposition 6.4 (Bias-Variance decomposition of $\nabla L_{n,h}(u,\theta_{0}(u))$ ).

Let Assumptions 2.1, 3.1, 3.3 and 3.4 hold.

(i)

Decomposition: Let $\mbox{tr}\{\cdot\}$ denote the trace of a matrix, $\mu_{K}:=\int K(x)^{2}\ \mbox{d}x$ ,

[TABLE]

Set $B_{h}:=\int_{0}^{1}|b_{h}(u)|_{V(\theta_{0}(u))^{-1}}^{2}w_{n,h}(u)\ \mbox{d}u$ , and $V_{h}:=\int_{0}^{1}v_{h}(u)w_{n,h}(u)\ \mbox{d}u$ . Then it holds that

[TABLE] 2. (ii)

Put $\hat{B}_{h}:=\int_{0}^{1}|\mathbb{E}\hat{L}_{n,h}(u,\theta_{0}(u))|_{V(\theta_{0}(u))^{-1}}^{2}w_{n,h}(u)\ \mbox{d}u$ and define the discrete bias terms $\hat{B}_{h}^{dis}:=\frac{1}{n}\sum_{t=1}^{n}|\mathbb{E}\hat{L}_{n,h}(t/n,\theta_{0}(t/n))|^{2}_{V(\theta_{0}(t/n))^{-1}}w_{n,h}(t/n)$ and $B_{h}^{dis}:=\frac{1}{n}\sum_{t=1}^{n}|\mathbb{E}L_{n,h}(t/n,\theta_{0}(t/n))|^{2}_{V(\theta_{0}(t/n))^{-1}}w_{n,h}(t/n)$ . Then it holds for $\tilde{B}_{h}\in\{\hat{B}_{h},\hat{B}_{h}^{dis},B_{h}^{dis}\}$ that

[TABLE] 3. (iii)

Bias expansion: Suppose additionally that Assumption 3.7 holds. Then it holds uniformly in $u\in[\frac{h}{2},1-\frac{h}{2}]$ that

[TABLE]

6.3 Uniform convergence results and moment inequalities for the local likelihood $L_{n,h}(u,\theta)$

In this section we show the uniform convergence of empirical processes of $X_{t,n}$ towards their expectations. We give convergence rates and prove the uniform consistency (w.r.t. $u$ and $h$ ) of the maximum likelihood estimator $\hat{\theta}_{h}(u)$ towards $\theta_{0}(u)$ . For some $\phi:[0,1]\to\mathbb{R}$ and $g\in\mathcal{L}(M,\chi,C)$ , define

[TABLE]

The proof of the next Lemma 6.5 as well as the proofs of Lemma 3.10, 6.11 use Lemma C.1 which is deferred to the appendix due to its complexity. It allows to bound moments of linear, quadratic and cubic forms of functions of locally stationary processes. For instance we obtain bounds $\|\sum_{t=1}^{n}a_{t}V_{t,n}^{(1)}\|_{q}\leq\tilde{C}_{q}\big{(}\sum_{t=1}^{n}a_{t}^{2}\big{)}^{1/2}$ and $\|\sum_{s,t=1}^{n}a_{s,t}V_{t,n}^{(1)}(s)V_{s,n}^{(2)}(t)\|_{q}\leq\tilde{C}_{q}\big{(}\sum_{t=1}^{n}a_{s,t}^{2}\big{)}^{1/2}$ for deterministic numbers $a_{t}$ or $a_{s,t}$ and processes $V_{t,n}^{(1)}$ , $V_{t,n}^{(1)}(s)$ and $V_{s,n}^{(2)}(t)$ which fulfill dependence conditions and have bounded variation with respect to the indices $(s)$ and $(t)$ .

Lemma 6.5 (Moment inequality).

Let Assumption 2.1, 3.1 hold. Let $g\in\mathcal{L}(M,\chi,C)$ . Then, for all $\theta\in\theta$ ,

[TABLE]

where $\psi_{q}(t)=\sum_{j=0}^{t-1}\chi_{j}\delta_{qM}(t-j)$ , $C_{\delta,q}:=C_{1}(1+2(D_{qM}|\chi|_{1})^{M-1})$ and $\rho_{1,\psi C_{\delta,\cdot},q}$ is defined in Lemma C.1.

Proof of Lemma 6.5.

Note that for $g\in\mathcal{L}(M,\chi,C)$ , we have by Hoelder’s inequality for all $\theta\in\Theta$ , $u\in[0,1]$ :

[TABLE]

Since $\sum_{t=1}^{\infty}\Big{(}\sum_{j=0}^{t-1}\chi_{j}\delta_{qM}(t-j)\Big{)}\leq\sum_{j=0}^{\infty}\chi_{j}\cdot\sum_{t=1}^{\infty}\delta_{qM}(t)<\infty$ , Lemma C.1 is applicable and we obtain the assertion. ∎

Lemma 6.6 (Continuity properties of localized sums).

Let Assumption 2.1, 3.1 and 3.4 hold. Let $g\in\mathcal{L}(M,\chi,C)$ . Then it holds for arbitrary $\theta,\theta^{\prime}\in\Theta$ , $u,u^{\prime}\in[0,1]$ :

[TABLE]

where $C_{\infty,q}=C_{\infty,q}(g)$ , $C_{-,q}=C_{-,q}(g)$ depend solely on $|g|_{\infty}:=\sup_{\theta\in\Theta}|g(0,\theta)|<\infty$ and $M,\chi,C$ .

Proof.

Define $|g|_{\infty}:=\sup_{\theta\in\Theta}|g(0,\theta)|<\infty$ . Since $g\in\mathcal{L}(M,\chi,C)$ , it holds that for all $\theta\in\Theta$ , $|g(y,\theta)|\leq|g|_{\infty}+C_{1}|y|_{\chi,1}\cdot\big{(}1+|y|_{\chi,1}^{M-1}\big{)}$ . By Young’s inequality, it holds that $a\leq\frac{1}{M}\big{(}(M-1)+a^{M}\big{)}$ for $M\geq 1$ and nonnegative real numbers $a$ , which shows that there exists a constant $C_{\infty}=C_{\infty}(g)>0$ such that for $y\in\mathbb{R}^{\mathbb{N}}$ ,

[TABLE]

We can use the bound (32) to see that uniformly in $u,\theta,t$ it holds that $\|E_{n}(K_{h}(\cdot-u),g,\theta)\|_{q}\leq|K|_{\infty}C_{\infty}(1+(|\chi|_{1}D_{qM})^{M})=:C_{\infty,q}$ . It holds that

[TABLE]

∎

Lemma 6.7 (Uniform convergence and weak bias expansion).

Let Assumption 2.1, 3.1 and 3.4 hold. Let $g\in\mathcal{L}(M,\chi,C)$ . Then for all $0<\alpha<\frac{1}{2}$ , it holds almost surely that

[TABLE]

and there exists a constant $C_{S}>0$ independent of $u,h,\theta$ such that for all $u\in\text{supp}(w_{n,h})\subset[\frac{h}{2},1-\frac{h}{2}]$ :

[TABLE]

Proof of Lemma 6.7.

Define

[TABLE]

where $\xi=(h,u,\theta)\in\Xi_{n}:=\{(h,u,\theta):h\in H_{n},u\in\text{supp}(w_{n,h}),\theta\in\Theta\}$ . For each $r>0$ , we can find a space $\Xi_{n}^{\prime}$ with $\#\Xi_{n}^{\prime}<c_{\gamma}n^{\gamma}$ such that the compact space $\Xi_{n}$ is approximated in the following way: for each $\xi=(h,u,\theta)\in\Xi_{n}$ there is a $\xi^{\prime}=(h^{\prime},u^{\prime},\theta^{\prime})\in\Xi_{n}^{\prime}$ such that $|\xi-\xi^{\prime}|_{1}\leq c_{r}n^{-r}$ . Now fix some $\delta>0$ . For $0<\alpha<\frac{1}{2}$ , we obtain

[TABLE]

Our goal is to bound $W_{1}$ , $W_{2}$ by absolutely summable sequences in $n$ . Then the assertion follows from Borel-Cantelli’s lemma. From Lemma 6.5 we obtain:

[TABLE]

Furthermore we have for $q\geq 1$ by Lemma 6.1 that

[TABLE]

By Markov’s inequality, it follows that

[TABLE]

and thus for $q$ large enough, $W_{1}\leq\#\Xi_{n}^{\prime}\cdot\sup_{\xi\in\Xi_{n}}\mathbb{P}\big{(}(nh)^{\frac{1}{2}-\alpha}|f(\xi)|>\delta/2\big{)}\leq\big{(}\frac{\rho_{1,\psi C_{\delta,\cdot},q}|K|_{\infty}+C_{S,q}}{\delta/2}\big{)}^{q}\cdot n^{q-\alpha\delta q}$ is bounded by an absolutely summable sequence in $n$ .

We now discuss $W_{2}$ . Define $Z_{n}:=1+\frac{1}{n}\sum_{t=1}^{n}|Y_{t,n}^{c}|_{\chi,1}^{M}$ . Using the inequality (32) and the Lipschitz property of $K$ , we obtain

[TABLE]

Since $\underline{h}\geq c_{0}n^{\delta-1}$ and $\overline{h}\leq c_{1}n^{-\delta}$ (cf. Assumption 3.4), we have shown that $(nh)^{\frac{1}{2}-\alpha}|f(\xi)-f(\xi^{\prime})|\leq C(n)\cdot|\xi-\xi^{\prime}|_{1}\cdot Z_{n}$ , where the deterministic $C(n)$ grows only polynomially fast in $n$ . Choose $r$ large enough and some constants $\gamma_{r},C_{r}>0$ such that $C(n)c_{r}n^{-r}\leq C_{r}n^{-(1+\gamma_{r})}$ for all $n\in\mathbb{N}$ , then we have

[TABLE]

which is absolutely summable.

The proof of (34) is immediate from the bounds (59), (61) and (26) applied to each summand of $\mathbb{E}E_{n}(K_{h}(\cdot-u),g,\theta)$ and the fact that $K$ has bounded variation which gives

[TABLE]

as long as $u\in[\frac{h}{2},1-\frac{h}{2}]$ . ∎

Lemma 6.8.

Let Assumption 2.1, 3.1 and 3.4 hold. Let $g\in\mathcal{L}(M,\chi,C)$ . Define $E_{n,-s}(\phi,g,\theta):=\frac{1}{n}\sum_{t=1,t\not=s}^{n}\phi\big{(}\frac{t}{n}\big{)}\cdot g(Y_{t,n}^{c},\theta)$ . Then for all $0<\alpha<1$ , we have

[TABLE]

Proof of Lemma 6.8:.

Fix $\delta>0$ . Since $F_{n}\leq n^{-\delta\alpha}\cdot|K|_{\infty}C_{\infty}\cdot\sup_{s=1,...,n}\big{(}1+|Y_{s,n}|_{\chi,1}^{M}\big{)}$ , we obtain by Markov’s inequality

[TABLE]

If $q$ is chosen large enough, we obtain the assertion by Borel-Cantelli’s lemma. ∎

The following corollary is immediate from Lemma 6.7 and 6.8.

Corollary 6.9 (Uniform convergence of likelihoods).

Let Assumption 2.1, 3.1, 3.3 and 3.4 hold. Then for all $k=0,1,2,3$ and all $0<\alpha\leq\frac{1}{2}$ it holds component-wise that

[TABLE]

and there exists a constant $C_{S}>0$ independent of $u,h,\theta$ such that (component-wise):

[TABLE]

and for all $0<\alpha^{\prime}\leq 1$ , it holds that

[TABLE]

The following Theorem is a consequence of Corollary 6.9. The proof uses standard arguments from maximum likelihood theory and is postponed to the appendix.

Theorem 6.10 (Uniform strong consistency of the maximum likelihood estimator).

Let Assumptions 2.1, 3.1, 3.3 and 3.4 hold. Then for each $0<\alpha\leq\frac{1}{2}$ , it holds almost surely in each component that

[TABLE]

Furthermore for $n$ large enough, we have uniformly in $h\in H_{n}$ , $u\in\text{supp}(w_{n,h})$ for each component that

[TABLE]

The results still hold if $\hat{\theta}_{h}$ , $L_{n,h}$ are replaced by $\hat{\theta}_{h,-t}$ , $L_{n,h,-t}$ accordingly.

6.4 Proofs of the results of Chapter 3

Proof of Lemma 3.10.

We have for arbitrary $q>2$ :

[TABLE]

Since $\theta_{0}$ has bounded variation $B_{K}$ , $w_{n,h}$ has bounded variation $B_{w}$ and $\|\nabla_{i}L_{n,h}(u,\theta)\|_{2q}\leq C_{\infty,2q}$ , the second summand of (39) is of order $O(n^{-1})$ . Furthermore, by Lemma 6.1, Lemma 6.5 and Lemma 6.7 we obtain for each component

[TABLE]

By Lemma 6.6, we have

[TABLE]

Since $\theta_{0}$ has bounded variation $B_{\theta_{0}}$ , we obtain that the first summand in (39) is $O((nh)^{-1}\big{\{}h^{\beta\wedge 1}+(nh)^{-1/2}\big{\}})$ . So in view of Proposition 6.4, we have shown that there exists $C,\gamma>0$ such that

[TABLE]

Define $Z_{n}:=1+\frac{1}{n}\sum_{t=1}^{n}|Y_{t,n}^{c}|_{\chi,1}^{M}$ . Using the inequality (32) and the notation therein, we obtain for each component that $|\nabla L_{n,h}(u,\theta)|\leq\frac{1}{\underline{h}}|K|_{\infty}C_{\infty}Z_{n}$ and $|\nabla L_{n,h}(u,\theta)-\nabla L_{n,h^{\prime}}(u,\theta)|\leq\frac{1}{\underline{h}^{3}}L_{K}C_{\infty}Z_{n}|h-h^{\prime}|$ . These results together with (14) imply

[TABLE]

A similar argumentation is valid for $|d_{I}^{*}(\hat{\theta}_{h},\theta_{0})-d_{I}^{*}(\hat{\theta}_{h^{\prime}},\theta_{0})|$ . Since $\underline{h}\geq c_{0}n^{\delta-1}$ by assumption, we have shown that there exists $C(n)$ which grows at most polynomially in $n$ such that

[TABLE]

As in the proof of Lemma 3.10 it can be shown that (40) and (41) together imply that $\sup_{h\in H_{n}}\frac{|d_{A}^{*}(\hat{\theta}_{h},\theta_{0})-d_{I}^{*}(\hat{\theta}_{h},\theta_{0})|}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ a.s.

With the definitions $A_{1}(u):=|\nabla\hat{L}_{n,h}(u,\theta_{0}(u))-\mathbb{E}\nabla\hat{L}_{n,h}(u,\theta_{0}(u))|_{V(\theta_{0}(u))}^{2}$ and $A_{2}(u):=\langle\nabla\hat{L}_{n,h}(u,\theta_{0}(u))-\mathbb{E}\nabla\hat{L}_{n,h}(u,\theta_{0}(u)),V(\theta_{0}(u))\mathbb{E}\nabla\hat{L}_{n,h}(u,\theta_{0}(u))\rangle$ , we decompose

[TABLE]

With Lemma C.1 we obtain $\|R_{n,h,1}\|_{q}=O(n^{-1})$ , $\|R_{n,h,2}\|_{q}=O(n^{-1/2}(nh)^{-1/2})$ and $\|R_{n,h,3}\|_{q}=O(n^{-1/2}\hat{B}_{h}+n^{-1/2}(nh)^{-1/2})$ (see Proposition 6.4(ii) for $\hat{B}_{h}$ ), i.e. $\frac{\|R_{n,h,i}\|_{q}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}=O(n^{-\tilde{\gamma}})$ with some $\tilde{\gamma}>0$ . Details are deferred to the appendix. It is straightforward to see that $|R_{n,h,i}-R_{n,h^{\prime},i}|$ fulfills a similar condition as in (40) for $i=1,2,3$ . The technique from the proof of Lemma 3.10 implies $\sup_{h\in H_{n}}\big{|}\frac{d_{I}^{*}(\hat{\theta}_{h},\theta_{0})-d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\big{|}\to 0$ a.s. ∎

Proof of Corollary 3.11.

The convergence $\sup_{h\in H_{n}}\big{|}\frac{d_{I}(\hat{\theta}_{h},\theta_{0})-d_{I}^{*}(\hat{\theta}_{h},\theta_{0})}{d_{I}^{*}(\hat{\theta}_{h},\theta_{0})}\big{|}\to 0$ a.s. follows from the decomposition

[TABLE]

and the results from Corollary 6.9 and Theorem 6.10. Together with Lemma 3.10, the assertion of the Corollary follows. Details can be found in the appendix. ∎

Proof of Lemma 3.12.

Put

[TABLE]

We have to show that

[TABLE]

then it follows immediately from Lemma 3.10:

[TABLE]

Using the same techniques as in the proof of Corollary 3.11 (and additionally the uniform convergence results of Corollary 6.9), it can be shown that

[TABLE]

and we can conclude from (43), (44) (by a similar expansion as in (58)) that $\sup_{h\in H_{n}}\big{|}\frac{\overline{d}_{A}(\hat{\theta}_{h},\theta_{0})-d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\big{|}\to 0$ a.s. It remains to prove (42). By applying the Cauchy Schwarz inequality, we obtain

[TABLE]

where $W_{n}:=\frac{1}{n}\sum_{t=1}^{n}\big{|}\nabla\ell(Y_{t,n}^{c},\theta_{0}(t/n))\big{|}^{2}_{V(\theta_{0}(t/n))^{-1}}$ is bounded a.s. (details are deferred to the appendix). The assertion now follows from (45), Lemma 3.10 and the expansion $d_{A}^{*}=d_{M}^{*}\cdot\big{(}1+\frac{d_{A}^{*}-d_{M}^{*}}{d_{M}^{*}}\big{)}$ . ∎

For some $A\in\mathbb{R}^{d\times d\times d}$ and vectors $y,z\in\mathbb{R}^{d}$ , define $x:=A[y,z]\in\mathbb{R}^{d}$ as $x_{i}:=\sum_{j,k=1}^{d}A_{ijk}y_{j}z_{k}$ .

Proof of Lemma 3.13.

Define $\ell_{t,n}(\theta):=\ell(Y_{t,n}^{c},\theta)$ . By Taylor’s expansion, it holds that

[TABLE]

with some intermediate value $\bar{\theta}_{h,-t}(u))$ which satisfies $|\bar{\theta}_{h,-t}(u))-\theta_{0}(u)|_{1}\leq|\hat{\theta}_{h,-t}(u)-\theta_{0}(u)|_{1}$ . By Theorem 6.10, $\hat{\theta}_{h,-t}(u)$ converges to $\theta_{0}(u)$ uniformly in $u,h,t$ and thus lies in the interior of $\Theta$ for $n$ large enough. Using a third-order Taylor expansion, we obtain

[TABLE]

with some intermediate value $\tilde{\theta}_{h,-t}(u))$ which satisfies $|\tilde{\theta}_{h,-t}(u))-\theta_{0}(u)|_{1}\leq|\hat{\theta}_{h,-t}(u)-\theta_{0}(u)|_{1}$ . Put $V_{t,n}:=\nabla^{2}L_{n,h,-t}(t/n,\theta_{0}(t/n))$ . For $n$ large enough and $h\in H_{n}$ , we have by Corollary 6.9 that $|V_{t,n}-V(\theta_{0}(t/n))|_{2},|\mathbb{E}V_{t,n}-V(\theta_{0}(t/n))|_{2}\leq\frac{\lambda_{0}}{4}$ . Then it follows that the minimal eigenvalue of $V_{t,n}$ and $\mathbb{E}V_{t,n}$ are bounded from below by $\frac{\lambda_{0}}{2}$ . So for $n$ large enough, $h\in H_{n}$ , we have

[TABLE]

With (46) and (47), we obtain the decomposition

[TABLE]

The remainders $R_{n,h,i}$ ( $i=1,2,3$ ) now have to be discussed separately. To get rid of $V_{t,n}^{-1}$ and the intermediate values $\tilde{\theta}_{h,-t}$ and $\bar{\theta}_{h,-t}$ we replace them by $(\mathbb{E}V_{t,n})^{-1}$ and $\theta_{0}$ , respectively. To replace $V_{t,n}^{-1}$ , we use the decompositions

[TABLE]

It can be shown these replacements are of order $(nh)^{-\frac{3}{2}(1-\alpha)}+(B_{h}^{dis})^{3/2}$ uniformly in $h\in H_{n}$ with arbitrary small $\alpha>0$ by using the uniform results of Corollary 6.9 and Theorem 6.10, (38) (see Proposition 6.4 for $B_{h}^{dis}$ ). By the decomposition of $d_{M}^{*}(\hat{\theta}_{h},\theta_{0})$ in Proposition 6.4(i),(ii) the replacements are of smaller order than $d_{M}^{*}(\hat{\theta}_{h},\theta_{0})$ . The remaining terms are listed in Lemma 6.11, where also their convergence is proven. Details are in the appendix. ∎

In the proof of the following Lemma 6.11 we use a similar technique as in Lemma 6.7 (or Lemma 3.10). The main part therefore is to calculate the norm $\|\cdot\|_{q}$ of the quantities which is done via the results of Lemma C.1. Details can be found in the appendix.

Lemma 6.11.

Let Assumption 2.1, 3.1, 3.3 and 3.4 hold. Put $V_{t,n}:=\nabla^{2}L_{n,h,-t}(t/n,\theta_{0}(t/n))$ . Then it holds almost surely that

[TABLE]

Acknowledgements.

We gratefully acknowledge support by Deutsche Forschungsgemeinschaft through the Research Training Group RTG 1653.

Appendix A More detailed proofs of section 3

Proof of Corollary 3.11, more detailed.

We have

[TABLE]

where $I_{d}\in\mathbb{R}^{d\times d}$ is the identity matrix and

[TABLE]

and $\bar{\theta}_{h}(u)$ is some intermediate value with $|\bar{\theta}_{h}(u)-\theta_{0}(u)|_{2}\leq|\hat{\theta}_{h}(u)-\theta_{0}(u)|_{2}$ . Using the bound $|\langle x,Ax\rangle|\leq|x|_{2}^{2}|A|_{spec}$ , we have

[TABLE]

By Corollary 6.9 and Theorem 6.10, we have

[TABLE]

thus

[TABLE]

According to Assumption 3.3, let $\lambda_{0}>0$ be the value which bounds all eigenvalues from $V(\theta)$ from below. Using the representations $d_{I}(\hat{\theta}_{h},\theta_{0})=\int_{0}^{1}|(I_{d}+R_{h}(u))\cdot D_{h}(u)|^{2}_{V(\theta_{0}(u))}w_{n,h}(u)\ \mbox{d}u$ , $d_{I}^{*}(\hat{\theta}_{h},\theta_{0})=\int_{0}^{1}|D_{h}(u)|^{2}_{V(\theta_{0}(u))}w_{n,h}(u)\ \mbox{d}u$ , we conclude with (56), (57):

[TABLE]

Using the shortcuts $d_{I}=d_{I}(\hat{\theta}_{h},\theta_{0})$ and so on, we have

[TABLE]

hence, the assertion follows from Lemma 3.10. The proof for $d_{A}$ is the same by using sums instead of integrals. ∎

Appendix B Additional proofs of section 6

Proof of Lemma 6.1.

Since $g\in\mathcal{L}(M,\chi,C)$ , we have by Hoelder’s inequality:

[TABLE]

Furthermore, we have

[TABLE]

Define $\tilde{C}_{S,q,1}:=C_{1}(1+2(D_{qM}|\chi|_{1})^{M-1})\cdot(C_{B}|\chi|_{1}+pL_{K}C_{A}\sum_{j=1}^{\infty}j\chi_{j})$ , then

[TABLE]

By Hoelder’s inequality, we have for arbitrary $u\in[0,1]$ :

[TABLE]

Defining $\tilde{C}_{S,q,2}:=C_{1}(1+2(D_{qM}|\chi|_{1})^{M-1})D_{qM}\cdot\sum_{t=1}^{\infty}\sum_{j=t+1}^{\infty}\chi_{j}$ , we obtain

[TABLE]

(60) and (62) give the result. ∎

Proof of Lemma 6.3:.

Put $\tilde{M}:=M-2$ . Define $f_{j}:=\partial_{j}g$ and $D_{i}:=\max\{\tilde{\chi}_{i},\psi_{1}(i)\psi_{2}(j)\}$ . We show that $f_{j}:(\mathbb{R}^{\infty},|\cdot|_{D,1})\to(\mathbb{R},|\cdot|)$ is Frechet differentiable with derivative $f_{j}^{\prime}(y)h:=\sum_{i=1}^{\infty}\partial_{i}\partial_{j}g(y)\cdot h_{i}$ . Choose $h\in\mathbb{R}^{\infty}$ with $|h|_{D,1}<\varepsilon$ . Let $e_{j}=(e_{j,i})_{i=1,2,...}\in\mathbb{R}^{\infty}$ be a sequence of zeros where only at the $j$ -th position is a 1. By the mean value theorem in $\mathbb{R}$ , there exists $s_{i}\in[0,1]$ such that

[TABLE]

This shows Frechet differentiability of $f_{j}$ . By applying the chain rule, $s\mapsto f_{j}(y+s\cdot(y^{\prime}-y))$ is differentiable with derivative $\sum_{i=1}^{\infty}\partial_{i}f_{j}(y+s\cdot(y^{\prime}-y))\cdot(y_{i}^{\prime}-y_{i})$ . By the fundamental theorem of analysis,

[TABLE]

with some constant $\tilde{\psi}_{1}$ dependent on $\tilde{M}$ . This shows that $f_{j}=\partial_{j}g\in\mathcal{L}(\tilde{M}+1,\tilde{\chi}+\psi_{1},\tilde{\psi}_{1}\psi_{2}(j))$ . In the same manner, we obtain Frechet differentiability of $g$ itself, and $g\in\mathcal{L}(M,\tilde{\chi}+\psi_{1}+\tilde{\psi}_{1}\psi_{2},\tilde{\psi}_{1}\tilde{\psi}_{2})$ with some constant $\tilde{\psi}_{2}$ depending on $\tilde{M}$ . Since $u\mapsto\theta_{0}(u)$ and $\theta\mapsto\tilde{Y}_{t}(\theta)$ are twice continuously differentiable, we conclude that the composition $u\mapsto\tilde{Y}_{t}(u)=\tilde{Y}_{t}(\theta_{0}(u))$ is twice continuously differentiable and thus

[TABLE]

by the chain rule. We obtain with Hoelder’s inequality

[TABLE]

where $\tilde{D}_{2}:=|\tilde{\chi}|_{1}D_{M}\cdot(1+(|\tilde{\chi}|_{1}D_{M})^{\tilde{M}-1})+1$ . With similar arguments, we can show that $\|\partial_{j}g(\tilde{Y}_{0}(u))\|_{M/(\tilde{M}+1)}\leq\tilde{D}_{1}\psi_{2}(j)$ and $\|g(\tilde{Y}_{0}(u))\|_{1}\leq\tilde{D}_{0}$ with some constants $\tilde{D}_{0},\tilde{D}_{1}>0$ . Hoelder’s inequality yields

[TABLE]

so we have proven the existence of all terms in the expansion (27). It remains to analyze the residual term $R(u,\xi)$ . Since $u\mapsto\partial_{u}^{2}g(\tilde{Y}_{0}(u))$ is (uniformly) continuous a.s., we have

[TABLE]

Furthermore, it holds that $\sup_{u\in[0,1]}|\tilde{Y}_{0}(u)-\tilde{Y}_{0}(0)|\leq\sup_{u\in[0,1]}|\partial_{u}\tilde{Y}_{0}(u)|\leq\sup_{u\in[0,1]}|\theta_{0}^{\prime}(u)|_{2}\cdot\sup_{\theta\in\Theta}|\nabla\tilde{Y}_{t}(\theta)|_{2}$ and thus $\|\sup_{u\in[0,1]}|\tilde{Y}_{0}(u)|\|_{M}$ is finite by Assumption 3.7. Assumption 3.7 also directly implies that

[TABLE]

Using similar techniques as in (63) and (64), $\|\sup_{u\in[0,1]}|\partial_{u}^{2}g(\tilde{Y}_{0}(u))|\|_{1}<\infty$ follows. The dominated convergence theorem and (65) yield

[TABLE]

∎

Proof of Proposition 6.4.

(i) Let $j\in\{1,...,d\}$ . By Lemma 6.1, there exists some constant $C_{S}>0$ such that

[TABLE]

By $\nabla\ell\in\mathcal{L}_{\infty}(M,\chi,C)$ (component-wise) and (32), it holds that

[TABLE]

Furthermore, $v\mapsto\mathbb{E}\nabla\ell(\tilde{Y}_{0}(v),\theta)$ has bounded variation $B_{\nabla\ell}$ (uniformly in $\theta$ ) since $\theta_{0}$ has bounded variation. The same holds for the kernel $K$ . We conclude that

[TABLE]

This implies, uniformly in $u,h$ ,

[TABLE]

By Lemma 6.1 and Lemma 6.2, there exist constants $C_{S}^{\prime},C_{bias}>0$ such that

[TABLE]

By the (component-wise) martingale difference property of $\nabla\ell(\tilde{Y}_{t}(u),\theta_{0}(u))$ and since $K$ has bounded variation, we have

[TABLE]

Let us use the abbreviations $\nabla L:=\nabla L_{n,h}(u,\theta_{0}(u))-\mathbb{E}\nabla L_{n,h}(u,\theta_{0}(u))$ and $\nabla\tilde{L}:=\nabla\tilde{L}_{n,h}(u,\theta_{0}(u)):=\frac{1}{n}\sum_{t=1}^{n}K_{h}(t/n-u)\ell(\tilde{Y}_{t}(u),\theta_{0}(u))$ . With (69) we conclude

[TABLE]

Note that $\mathbb{E}\big{|}\nabla\ell(\tilde{Y}_{t}(u),\theta_{0}(u))\big{|}_{V(\theta_{0}(u))^{-1}}^{2}=\mbox{tr}\{V^{-1}(\theta_{0}(u))I(\theta_{0}(u))\}$ . By combining (68), (70) and (71), we obtain

[TABLE]

from which the stated convergence follows by integration.

(ii) By Lipschitz continuity of $K$ , we have for each component $i=1,...,p$ that $|b_{h,i}(u)-b_{h,i}(u^{\prime})|\leq\frac{2}{h}L_{K}\sup_{v\in[0,1]}|\nabla_{i}\ell(\tilde{Y}_{t}(v),\theta_{0}(u))|$ . Furthermore, by using the inequality $|V(\theta)^{-1}-V(\theta^{\prime})^{-1}|_{spec}\leq\lambda_{0}^{-2}|V(\theta)-V(\theta^{\prime})|_{2}$ (where $|A|_{spec}$ denotes the spectral norm of a matrix $A$ ), the bounded variation of $u\mapsto V(\theta_{0}(u))$ and $w_{n,h}$ and Lemma 6.2, we have

[TABLE]

The uniform results (66) and (67) together with Lemma 6.2 provide (30).

(iii) If $u\in[\frac{h}{2},1-\frac{h}{2}]$ , it holds that $b_{h}(u)=\int_{-1/2}^{1/2}K(y)\cdot\mathbb{E}\nabla\ell(\tilde{Y}_{t}(\theta_{0}(u+yh)),\theta_{0}(u))\ \mbox{d}y$ . According to Lemma 6.3, we have uniformly in $u\in[0,1]$ :

[TABLE]

Since $\nabla\ell(\tilde{Y}_{t}(\theta_{0}(u)),\theta_{0}(u))$ is a martingale difference sequence and $K$ is symmetric, the first two terms in (72) vanish. The last term in (72) is $o(h^{2})$ uniformly in $u\in[0,1]$ since $K(y)=0\Leftrightarrow|y|\leq\frac{1}{2}$ and $\lim_{\xi\to 0}\sup_{u\in[0,1]}|R(u,\xi)|=0$ from Lemma 6.3. ∎

Proof of Theorem 6.10.

By application of Corollary 6.9 with $k=0$ , we obtain the uniform convergence

[TABLE]

The identifiability condition in Assumption 3.3 implies that $L(u,\theta)$ attains its unique minimum at $\theta=\theta_{0}(u)$ . Standard arguments provide the uniform convergence (37) (see also Dahlhaus, Richter and Wu (2017)). Since $\theta_{0}(u)$ lies in the interior of $\Theta$ , we have that $\hat{\theta}_{h}(u)$ lies in the interior of $\Theta$ almost surely for $n$ large enough. With some intermediate value $\bar{\theta}_{h}(u)$ satisfying $|\bar{\theta}_{h}(u)-\theta_{0}(u)|_{1}\leq|\hat{\theta}_{h}(u)-\theta_{0}(u)|_{1}$ , it holds that

[TABLE]

By using the continuity of $\theta\mapsto V(\theta)$ and the uniform convergences of $\nabla^{2}L_{n,h}$ provided by Corollary 6.9, we have that

[TABLE]

converges to [math] uniformly in $h\in H_{n}$ , $u\in\text{supp}(w_{n,h})$ . Since the smallest eigenvalue of $V(\theta)$ is bounded from below by $\lambda_{0}>0$ uniformly in $\theta\in\Theta$ , we have that for $n$ large enough, the smallest eigenvalue of $\nabla^{2}L_{n,h}(u,\bar{\theta}_{h}(u))$ is bounded from below by $\frac{\lambda_{0}}{2}$ uniformly in $h\in H_{n}$ , $u\in\text{supp}(w_{n,h})$ giving the result (38). The arguments for $\hat{\theta}_{h,-t}(u)$ are similar. ∎

Proof of Lemma 3.10 (additional material).

We now show that (40) and (41) together imply that $\sup_{h\in H_{n}}\frac{Q_{n,h}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ a.s., where $Q_{n,h}:=|d_{A}^{*}(\hat{\theta}_{h},\theta_{0})-d_{I}^{*}(\hat{\theta}_{h},\theta_{0})|$ . For each $r>0$ , we can find a space $H_{n}^{\prime}:=\{\frac{k}{n^{r}}:k=1,...,n^{r}\}$ such that for each $h\in H_{n}$ there exists some $h^{\prime}\in H_{n}^{\prime}$ such that $|h-h^{\prime}|\leq n^{-r}$ . Choose $r$ large enough such that $C(n)\cdot n^{-r}\leq C_{\xi}n^{-(1+\xi)}$ with some constants $c_{\xi},\xi>0$ . Let $\delta>0$ be arbitrarily chosen. Then by Markov’s inequality, (40) and (41),

[TABLE]

which is absolutely summable for $q$ large enough, giving the result by applying Borel-Cantelli’s lemma. In the following we will use this technique for similar expressions without explicitly showing results of the type (41). The proofs are similar to the proof of (41). Following the argumentation in (B), we obtain

[TABLE]

For $u\in[0,1]$ , it holds that

[TABLE]

where $A_{1}(u):=|\nabla\hat{L}_{n,h}(u,\theta_{0}(u))-\mathbb{E}\nabla\hat{L}_{n,h}(u,\theta_{0}(u))|_{V(\theta_{0}(u))}^{2}$ and $A_{2}(u):=\langle\nabla\hat{L}_{n,h}(u,\theta_{0}(u))-\mathbb{E}\nabla\hat{L}_{n,h}(u,\theta_{0}(u)),V(\theta_{0}(u))\mathbb{E}\nabla\hat{L}_{n,h}(u,\theta_{0}(u))\rangle$ . We now derive upper bounds for

[TABLE]

Lemma C.1, ((ii)) gives

[TABLE]

where $\psi_{q}(t)$ and $C_{\delta,q}$ are defined in Lemma 6.5. Furthermore, we have

[TABLE]

Put $\hat{b}_{h,i}(u)=\mathbb{E}\nabla_{i}\hat{L}_{n,h}(u,\theta_{0}(u))$ and $\hat{B}_{h}$ from Proposition 6.4. By Lemma 6.6, we have $|\hat{b}_{h,i}(u)-\hat{b}_{h,i}(v)|\leq C_{-,1}\big{\{}\frac{|u-v|}{h}+|\theta_{0}(u)-\theta_{0}(v)|_{1}\big{\}}$ and $|\hat{b}_{h,i}(u)|\leq C_{\infty,1}$ . This implies

[TABLE]

Since $\int_{0}^{1}|\hat{b}_{h,i}(u-vh)|^{2}w_{n,h}(u-vh)\ \mbox{d}u\leq\frac{p^{2}}{\lambda_{0}}\hat{B}_{h}$ , we obtain

[TABLE]

The bounds (74), (76) and (78) imply $\sup_{h\in H_{n}}\big{|}\frac{d_{I}^{*}(\hat{\theta}_{h},\theta_{0})-d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\big{|}\to 0$ a.s. ∎

Proof of Lemma 3.12 (additional material).

Using similar techniques as in (B), we obtain $\|W_{n}-\hat{W}_{n}\|_{q}\leq 2C_{\infty,2q}p^{2}V_{-1,max}C_{\delta,2q}n^{-1}$ , which shows $W_{n}-\hat{W}_{n}\to 0$ a.s. Furthermore, by ((ii)), we have for arbitrary $q>0$ that $\|\hat{W}_{n}-\mathbb{E}\hat{W}_{n}\|_{q}\leq\rho_{2,\psi C_{\delta,\cdot},q}p^{2}V_{-1,max}n^{-1/2}$ , showing that $\hat{W}_{n}-\mathbb{E}\hat{W}_{n}\to 0$ a.s. By the bound (32),

[TABLE]

which finally shows that $|\mathbb{E}\hat{W}_{n}|$ is bounded and thus $W_{n}$ is bounded a.s. ∎

Proof of Lemma 3.13 (additional material).

For shortness, let us define $\text{SUP }:=\sup_{i}\sup_{t=1,...,n}\sup_{u\in\text{supp}(w_{n,h})}\sup_{\theta\in\Theta}$ , where the supremum over $i$ should be understood as the maximum over all components of the argument (if it is a vector or matrix). It is well-known from Corollary (6.9) that for $k=0,1,2,3$ and all $0<\alpha\leq\frac{1}{2}$ ,

[TABLE]

Furthermore, let $C^{\#}$ denote a generic constant depending only on $p,V_{-1,max}$ and $V_{1,max}$ which may change its value from line to line.

We first discuss $R_{n,h,1}$ . By using (50), $R_{n,h,1}$ can be expanded into three terms:

[TABLE]

The first summand and the second summand in (80) are discussed in Lemma 6.11, (51) and (53). Put $Z_{n,k}:=1+\frac{1}{n}\sum_{t=1}^{n}|Y_{t,n}^{c}|_{\chi,1}^{kM}$ . Due to the Cauchy Schwarz inequality, the third summand in (80) is bounded by

[TABLE]

Here we used the fact that for $v\in\mathbb{R}^{d}$ and $j=1,...,p$ , we have $|v_{j}|^{2}\leq|v|_{\text{Id}}^{2}\leq\frac{1}{\lambda_{0}}|v|_{V(\theta)}^{2}$ for all $\theta\in\Theta$ by the assumption on $V(\theta)$ . $B_{h}^{dis}$ is defined in Proposition 6.4(ii). By (79), (30) and Proposition 6.4(i),(ii) we conclude that $\sup_{h\in H_{n}}\frac{R_{n,h,1,3}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ a.s. and $\sup_{h\in H_{n}}\frac{R_{n,h,1}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ a.s.

The next part of the proof addresses $R_{n,h,2}$ . Our goal is to eliminate $V_{t,n}^{-1}$ and the intermediate values. Put

[TABLE]

We now argue how $R_{n,h,2,1}$ can be obtained by successively replacing terms in $R_{n,h,2}$ with a rate smaller than $d_{M}^{*}(\hat{\theta}_{h},\theta_{0})$ . The remainder by replacing $V_{t,n}^{-1}$ by $(\mathbb{E}V_{t,n})^{-1}$ in $R_{n,h,2}$ is bounded by (see (50) and (38)):

[TABLE]

which is of order $(nh)^{\frac{1}{2}-\alpha}((nh)^{1-2\alpha}+n^{\frac{1}{2}-\alpha}+B_{h}^{dis})$ by (79).

The same arguments hold for the remainder which is obtained by replacing $\nabla^{3}L_{n,h,-t}(t/n,\tilde{\theta}_{h,-t}(t/n))$ with $\mathbb{E}[\nabla^{3}L_{n,h,-t}(t/n,\theta)]\big{|}_{\theta=\tilde{\theta}_{h,-t}(t/n)}$ . For the next replacement note that for arbitrary $g\in\mathcal{L}(M,C,\chi)$ , we have

[TABLE]

where $\tilde{g}(y,\theta)=1+|y|_{\chi,1}^{M}$ . By Lemma 6.7, $E_{n}(K_{h}(\cdot-u),\tilde{g},\theta)$ is bounded almost surely. The remainder by replacing $\mathbb{E}[\nabla^{3}L_{n,h,-t}(t/n,\theta)]\big{|}_{\theta=\tilde{\theta}_{h,-t}(t/n)}$ with $\mathbb{E}\nabla^{3}L_{n,h,-t}(t/n,\theta_{0}(t/n))$ is bounded by (see (38) and (82))

[TABLE]

While the second summand is discussed in Lemma 6.11, (55), the first and the third summand are of order $O((nh)^{\frac{3}{2}-3\alpha})+O(h^{(\beta\wedge 1)}B_{h}^{dis})$ by Lemma 6.2 and (79).

Lastly we replace $\hat{\theta}_{h,-t}(t/n)-\theta_{0}(t/n)$ twice by $(\mathbb{E}V_{t,n})^{-1}\nabla L_{n,h,-t}(t/n,\theta_{0}(t/n))$ by using the expansion

[TABLE]

where $|\breve{\theta}_{h,-t}(u)-\theta_{0}(u)|_{2}\leq|\hat{\theta}_{h,-t}(u)-\theta_{0}(u)|_{2}$ . The remainder is bounded by terms similar to (81) and (83).

With Proposition 6.4(i),(ii) we conclude that $\sup_{h\in H_{n}}\big{|}\frac{R_{n,h,2}-R_{n,h,2,1}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\big{|}\to 0$ a.s. In Lemma 6.11, (54) it is shown that $\sup_{h\in H_{n}}\big{|}\frac{R_{n,h,2,1}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\big{|}\to 0$ a.s.

Finally, put

[TABLE]

We shortly explain how $R_{n,h,3,1}$ can be obtained from $R_{n,h,3}$ . The intermediate value $\tilde{\theta}_{h,-t}(t/n)$ is replaced by $\theta_{0}(t/n)$ with remainder given by

[TABLE]

which is similar handled as in (83). The replacement of $\hat{\theta}_{h,-t}(t/n)-\theta_{0}(t/n)$ by $(\mathbb{E}V_{t,n})^{-1}\nabla L_{n,h,-t}(t/n,\theta_{0}(t/n))$ is done as for $R_{n,h,2}$ . We conclude with Proposition 6.4(i),(ii) that $\sup_{h\in H_{n}}\big{|}\frac{R_{n,h,3}-R_{n,h,3,1}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\big{|}\to 0$ a.s. The convergence $\sup_{h\in H_{n}}\big{|}\frac{R_{n,h,3,1}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\big{|}\to 0$ a.s. is proved in Lemma 6.11, (52).

∎

Proof of Lemma 6.11:.

We have uniformly in $u,\theta$ that $\delta_{q}^{\nabla\ell(\tilde{Y}(u),\theta)}(k)\leq C_{\delta,q}\psi_{q}(k)$ (see the bound in (31)), where $\psi_{q}$ and $C_{\delta,q}$ are defined in Lemma 6.5. Furthermore, we can use the same argumentation as in (26) to see that uniformly in $u,u^{\prime},\theta$ it holds that $\|\nabla\ell(\tilde{Y}_{0}(u),\theta)-\nabla\ell(\tilde{Y}_{0}(u^{\prime}),\theta)\|_{2}\leq C_{\delta,2}\|\tilde{X}_{0}(u)-\tilde{X}_{0}(u^{\prime})\|_{2M}\leq C_{\delta,2}C_{A}|\theta_{0}(u)-\theta_{0}(u^{\prime})|_{1}$ . Since $\theta_{0}$ is Hoelder continuous, there exists $\tilde{L}>0$ such that $|\theta_{0}(u)-\theta_{0}(u^{\prime})|_{1}\leq d\tilde{L}|u-u^{\prime}|^{\beta\wedge 1}$ . As in the proof of Lemma 3.13, let $C^{\#}$ denote a generic constant depending only on $p,V_{-1,max}$ and $V_{1,max}$ which may change its value from line to line. We use the same technique as in the proof of Lemma 3.10, (73), but omit the proofs on Lipschitz continuity in $h$ (cf. (41)) since they do not pose any extra difficulty.

We start by proving (51). Put

[TABLE]

By Hoelder’s inequality (cp. also Lemma 6.1) it follows that

[TABLE]

Since $\nabla\ell(\tilde{Y}_{t}(u),\theta_{0}(u))$ are martingale differences, $\nabla\hat{L}_{n,h,-t}(t/n,\theta_{0}(t/n))-\mathbb{E}\nabla\hat{L}_{n,h,-t}(t/n,\theta_{0}(t/n))-\nabla\tilde{L}_{n,h,-t}(t/n,\theta_{0}(t/n))$ has expectation 0 and we obtain

[TABLE]

which shows that $\sup_{h\in H_{n}}\frac{R_{1,n}}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ since $\psi_{2}$ is absolutely summable. Define

[TABLE]

Recall that $\psi_{q}(k)$ is absolutely summable. Furthermore, for some $v\in\mathbb{R}^{d}$ , note that for arbitrary $j=1,...,p$ , we have $|v_{j}|^{2}\leq|v|_{\text{Id}}^{2}\leq\frac{1}{\lambda_{0}}|v|_{V(\theta)}^{2}$ for all $\theta\in\Theta$ by the assumption on $V(\theta)$ . By Lemma C.1, we have

[TABLE]

which shows that $\sup_{h\in H_{n}}\frac{|R_{2,n}|}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ a.s. Finally, put

[TABLE]

Since $\nabla\ell(\tilde{Y}_{0}(u),\theta_{0}(u))$ are martingale differences, it holds that $\mathbb{E}R_{3,n}=0$ . Since $\chi_{i}=O(i^{-(3+\varepsilon)})$ and $\delta_{q}(k)=O(k^{-(3+\varepsilon)})$ , we conclude that $\sum_{i\geq 0}i\chi_{i}<\infty$ and $\sum_{k\geq 0}k\delta_{q}(k)<\infty$ , which also shows $\sum_{k\geq 0}k\psi_{q}(k)<\infty$ . Thus Lemma C.1 is applicable and we obtain

[TABLE]

which shows that $\sup_{h\in H_{n}}\frac{|R_{3,n}|}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ a.s.

We now prove (52). Similar arguments as in (B) can be used to show that

[TABLE]

fulfills $\sup_{h\in H_{n}}\frac{|S_{0,n}|}{d_{M}^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ a.s. Put

[TABLE]

Decompose

[TABLE]

where

[TABLE]

Since $\nabla^{2}\ell\in g(M,\chi,C)$ , we obtain by Lemma C.1, similarly to (85),

[TABLE]

By Lemma C.1, (90),

[TABLE]

By the Cauchy-Schwarz inequality, we have

[TABLE]

which shows that $\sup_{h\in H_{n}}\frac{|\mathbb{E}S_{1,n}|}{d^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ .

By Lemma C.1, we obtain similarly to (86),

[TABLE]

which shows that $\sup_{h\in H_{n}}\frac{|S_{1,n}-\mathbb{E}S_{1,n}|}{d^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ a.s. By Lemma C.1, (93),

[TABLE]

thus $\sup_{h\in H_{n}}\frac{|\mathbb{E}S_{3,n}|}{d^{*}(\hat{\theta}_{h},\theta_{0})}\to 0$ . Define

[TABLE]

then by Lemma C.1 and inequality (35) it holds that

[TABLE]

Define

[TABLE]

then by Lemma C.1 and inequality (35) it holds that

[TABLE]

Finally, by Lemma C.1, ((iii)), we have

[TABLE]

We now discuss (53). Put $\hat{V}_{t,n}:=\nabla^{2}\hat{L}_{n,h,-t}(t/n,\theta_{0}(t/n))$ . Similar arguments as in (B) can be used to show that

[TABLE]

fulfills $\|T_{0,n}\|_{q}=O(n^{-1})$ . Define

[TABLE]

Put

[TABLE]

By Lemma C.1, it holds that

[TABLE]

Due to the similar structure, the argumentation for $T_{1,n}$ can be mimicked from $S_{3,n}$ above and leads to $\|T_{1,n}\|_{q}=O((nh)^{-2}+(nh)^{-1}n^{-1/2})$ .

The stochastic structure of (54) is exactly the same as in (52), thus the proof follows the same lines. Finally, it is easy to see from Lemma C.1 that

[TABLE]

which shows (55). ∎

Appendix C Bounds for moments of sums, quadratic and cubic forms of Bernoulli shifts

The inequalities derived in this section are needed to prove the moment inequalities for the local likelihoods and its derivatives in Section 6.3. Results for linear and quadratic forms were obtained in Xiao and Wu (2012), however we need them in a more general setting.

Lemma C.1.

Let $q\geq 2$ . Let $\delta_{q}(k)$ , $k\geq 0$ be some sequence of nonnegative real numbers. Put $\Delta_{k,q}:=\sum_{l=k}^{\infty}\delta_{q}(l)$ . Let $a_{t},a_{s,t},a_{r,s,t}$ , $r,s,t=1,...,n$ , be some deterministic sequences of real numbers.

(i)

Let $V_{t,n}^{(1)}$ , $t=1,...,n$ be a triangular array with $\sup_{n\in\mathbb{N}}\delta_{q}^{V_{\cdot,n}^{(1)}}(k)\leq\delta_{q}(k)$ and $\mathbb{E}V_{t,n}^{(1)}=0$ . Put $\rho_{1,\delta,q}=q^{1/2}\Delta_{0,q}$ . Then it holds that

[TABLE] 2. (ii)

For $i=1,2$ , let $V_{t,n}^{(i)}(s)$ , $s,t=1,...,n$ be triangular arrays with $\mathbb{E}V_{t,n}^{(i)}(s)=0$ and $\sup_{n\in\mathbb{N}}\sup_{s=1,...,n}\delta_{q}^{V_{\cdot,n}^{(i)}(s)}(k)\leq\delta_{q}(k)$ as well as

[TABLE]

Put $\rho_{2,\delta,q}=8q^{1/2}\Delta_{0,2q}\big{[}q^{1/2}\Delta_{0,2q}+\sum_{k=0}^{\infty}\Delta_{k,2q}\big{]}$ and $\tilde{\rho}_{2,\delta}:=\Delta_{0,2}^{2}$ . Then it holds that

[TABLE] 3. (iii)

For $i=1,2,3$ , let $V_{t,n}^{(i)}(r,s)$ , $r,s,t=1,...,n$ be triangular arrays with $\mathbb{E}V_{t,n}^{(i)}(r,s)=0$ and $\sup_{n\in\mathbb{N}}\sup_{r,s=1,...,n}\delta_{q}^{V_{\cdot,n}^{(i)}(r,s)}(k)\leq\delta_{q}(k)$ . Assume that $V_{t,n}^{(i)}(r,\cdot)$ and $V_{t,n}^{(i)}(\cdot,s)$ fulfill (88) uniformly in $r,s$ as well as

[TABLE]

Put

[TABLE]

and $\tilde{\rho}_{3,\delta}:=12\Delta_{0,3}^{2}\big{[}\Delta_{0,3}+\sum_{k=0}^{\infty}\Delta_{k,3}\big{]}$ . Then it holds that

[TABLE]

Proof.

To keep the notation simple, set $a_{t},a_{s,t},a_{r,s,t}$ to [math] if one of the indices $r,s,t$ is not in $\{1,...,n\}$ . We start with proving the stochastic inequalities (87), ((ii)) and ((iii)). Since $\mathbb{E}V_{t,n}^{(1)}=0$ , it holds almost surely that $\sum_{t=1}^{n}a_{t}V_{t,n}^{(1)}=\sum_{k=0}^{\infty}\sum_{t=1}^{n}a_{t}P_{t-k}V_{t,n}^{(1)}$ . $(a_{t}\cdot P_{t-k}V_{t,n}^{(1)})$ , $t=1,...,n$ is a martingale difference sequence, thus with Rio (2009), Theorem 2.1 we obtain

[TABLE]

We now prove ((ii)). Define $D_{s,t,k_{1},k_{2}}:=a_{s,t}\big{\{}P_{s-k_{1}}V_{s,n}^{(1)}(t)P_{t-k_{2}}V_{t,n}^{(2)}(s)-\mathbb{E}P_{s-k_{1}}V_{s,n}^{(1)}(t)P_{t-k_{2}}V_{t,n}^{(2)}(s)\big{\}}$ . Note that

[TABLE]

where

[TABLE]

Here $(\sum_{s<t-k_{2}+k_{1}}D_{s,t,k_{1},k_{2}})_{t=1,...,n}$ as well as $(a_{s,t}P_{s-k_{1}}V_{s,n}^{(1)})_{s}$ are martingale difference sequences. By applying Theorem 2.1 in Rio (2009),

[TABLE]

Define $\tilde{a}_{s,t}:=a_{s,t}\mathbbm{1}_{\{s<t-k_{2}+k_{1}\}}$ . By partial summation and the Cauchy-Schwarz inequality, we have

[TABLE]

We finally obtain $A_{2,1}\leq 2(q-1)^{1/2}(2q-1)^{1/2}\Delta_{0,2q}^{2}\cdot\big{(}\sum_{s,t=1}^{n}a_{s,t}^{2}\big{)}^{1/2}$ , a similar upper bound holds for $A_{2,2}$ . To discuss $A_{1}$ , note that

[TABLE]

is a martingale difference sequence. The arguments $(s-k_{1}+k_{2})$ of $V_{s,n}^{(1)}$ and $(s)$ of $V_{s-k_{1}+k_{2},n}^{(2)}$ are omitted in the next steps. In the following, fix some $s,l\geq 0$ . Let $\varepsilon_{s-l}^{*}$ be an i.i.d. copy of $\varepsilon_{s-l}$ . Let $\mathcal{F}_{s}^{*(s-l)}:=(\varepsilon_{s},...,\varepsilon_{s-l+1},\varepsilon_{s-l}^{*},\varepsilon_{s-l-1},....)$ . For fixed $k_{1},i\geq 0$ and some random variable $V_{s}=H_{z}(\mathcal{F}_{s})$ , we define $V_{s}^{*}:=H_{z}(\mathcal{F}_{s}^{*(s-k_{1})})$ and $V_{s}^{**}:=H_{z}(\mathcal{F}_{s}^{*(s-k_{1}-i)})$ . Similarly, $\mathcal{F}_{s-k_{1}}^{*}:=\mathcal{F}_{s-k_{1}}^{*(s-k_{1})}$ and $\mathcal{F}_{s-k_{1}}^{**}:=\mathcal{F}_{s-k_{1}}^{*(s-k_{1}-i)}$ . We use the abbreviation $V^{*,**}:=(V^{*})^{**}$ . It holds that

[TABLE]

Note that $\|P_{s-k_{1}}V_{s-k_{1}+k_{2},n}^{(2)}\|_{2q}\leq\delta_{2q}(k_{2})$ and $\|(P_{s-k_{1}}V_{s,n}^{(1)})^{**}\|_{2q}\leq\delta_{2q}(k_{1})$ . Furthermore,

[TABLE]

and thus, by Jensen’s inequality,

[TABLE]

Note that in (94), there may be better estimations in special cases. We obtain

[TABLE]

We now prove ((iii)). To do so, define

[TABLE]

Here, we can bound $\sum_{k_{1},k_{2},k_{3}\geq 0}\big{\|}\sum_{r,s,t=1}^{n}D_{r,s,t,k_{1},k_{2},k_{3}}\big{\|}_{q}$ by four different types of terms. The first type (all indices are different) is of the form

[TABLE]

Note that the three sequences $\big{(}\sum_{s:s-k_{2}<r-k_{1}}\sum_{t:t-k_{3}<s-k_{2}}D_{r,s,t,k_{1},k_{2},k_{3}}\big{)}_{r}$ and $\big{(}\sum_{t:t-k_{3}<s-k_{2}}a_{r,s,t}P_{s-k_{2}}V_{s,n}^{(2)}P_{t-k_{3}}V_{t,n}^{(3)}\big{)}_{s}$ and $(a_{r,s,t}P_{t-k_{3}}V_{t,n}^{(3)})_{t}$ are martingale differences. Put $\tilde{a}_{r,s,t}:=a_{r,s,t}\mathbbm{1}_{\{t-k_{3}<s-k_{2}<r-k_{1}\}}$ . By partial summation, we have

[TABLE]

leading with Theorem 2.1. in Rio (2009) and Hoelder’s inequality to the upper bound

[TABLE]

for $A_{3}$ . Using the same partial summation argument as in the discussion of $A_{2,1}$ above, we obtain

[TABLE]

The second type (the two smaller indices are equal) is of the form

[TABLE]

Put $\hat{a}_{r,s,t}:=a_{r,s,t}\mathbbm{1}_{\{s-k_{2}<r-k_{1}\}}$ By applying similar partial summation techniques as for $A_{2,1}$ , we obtain the upper bound

[TABLE]

for $A_{4}$ . The same technique as applied in $A_{1}$ leads to the bound

[TABLE]

Note that $(P_{r-k_{1}-i}D_{r,r-k_{1}+k_{2},r-k_{1}+k_{3},k_{1},k_{2},k_{3}})_{r}$ is a martingale difference sequence. For brevity, we will omit the additional arguments $s,t$ of $V_{r,n}^{(i)}(s,t)$ in the following part. The third type (all three indices are equal) is of the form

[TABLE]

where

[TABLE]

and

[TABLE]

Similarly to (94), we obtain

[TABLE]

which leads to

[TABLE]

The fourth type (the two bigger indices are equal) has the form

[TABLE]

$A_{6}$ is bounded by the sum of three terms $A_{6,1}+A_{6,2}+A_{6,3}$ , which will be defined in the following. Put $a_{r,s,t}^{\circ}:=a_{r,s,t}\mathbbm{1}_{\{t-k_{3}<r-k_{1}-i\}}$ . For brevity in the following argumentation, put $\tilde{r}:=r-k_{1}+k_{2}$ . By using similar techniques as for $A_{1}$ , we obtain

[TABLE]

Applying the same partial summation techniques as for the term $A_{2,1}$ , we conclude

[TABLE]

With slight changes in the argumentation, the second term has the upper bound

[TABLE]

In the following, we will again omit the arguments $s,t$ of $V_{r,n}^{(i)}(s,t)$ . The third term reads

[TABLE]

Put $W_{1}:=P_{r-k_{1}}V_{r,n}^{(1)}$ , $W_{2}:=P_{r-k_{1}}V_{r-k_{1}+k_{2},n}^{(2)}$ , $Z:=P_{r-k_{1}-i}V_{r-k_{1}+k_{3},n}^{(3)}$ and $W:=P_{r-k_{1}-i}[W_{1}\cdot W_{2}]$ . Using similar techniques as in $A_{1}$ , we obtain

[TABLE]

which shows

[TABLE]

and finishes the proof of ((iii)).

To prove (90) and (93), we will omit the arguments $s,t$ of $V_{r,n}^{(i)}(s,t)$ since all bounds are uniform in these arguments. For (90), we use the inequalities

[TABLE]

To prove (93), note that

[TABLE]

The above term is bounded by two types of terms. The first type (all three indices $r-i,s-j,t-k$ are equal) is of the form

[TABLE]

the second type (the two bigger indices of $r-i,s-j,t-k$ are equal) is of the form

[TABLE]

∎

Appendix D Proofs of section 4

For linear time series, we use the model which was set up in Dahlhaus and Polonik (2009).

Proposition D.1 (Linear time series models).

Suppose that Assumption 3.3 (1) holds. Assume that

[TABLE]

with some coefficients $a_{t,n}(k)$ and $a_{\theta}(k)$ satisfying

[TABLE]

*with some absolutely summable sequence $C_{B}(k)$ . Assume that $\varepsilon_{t}$ , $t\in\mathbb{Z}$ are i.i.d. and have all moments, especially $\mathbb{E}\varepsilon_{0}=0$ and $\mathbb{E}\varepsilon_{0}^{2}=1$ .

Define $A_{\theta}(\lambda):=\sum_{k=0}^{\infty}a_{\theta}(k)e^{i\lambda k}$ , the spectral density $f_{\theta}(\lambda):=\frac{1}{2\pi}|A_{\theta}(\lambda)|^{2}$ and real numbers $\gamma_{\theta}(k):=\frac{1}{2\pi}\int_{-\pi}^{\pi}A_{\theta}(\lambda)^{-1}e^{-i\lambda k}\ \mbox{d}\lambda$ . Assume that*

(a)

For $\theta,\theta^{\prime}\in\Theta$ , it holds that $f_{\theta}=f_{\theta^{\prime}}$ implies $\theta=\theta^{\prime}$ . 2. (b)

$|A_{\theta}(\lambda)|\geq\delta_{A}>0$ * uniformly in $\theta\in\Theta,\lambda\in[0,2\pi]$ . $A_{\theta}(\lambda)$ is four times continuously differentiable in $\theta$ . Assume that there exist $\beta_{i}>3$ , $L_{i}>0$ such that $\nabla^{i}A_{\theta}(\cdot)\in\Sigma(\beta_{i},L_{i})$ ( $i=0,1,2,3,4$ ).* 3. (c)

The minimal eigenvalue of $\frac{1}{4\pi}\int\nabla\log f_{\theta}(\lambda)\cdot\nabla\log f_{\theta}(\lambda)^{\prime}\ \mbox{d}\lambda$ is bounded away from 0 uniformly in $\theta\in\Theta$ .

Then, assuming that $\varepsilon_{t}$ has a standard Gaussian distribution, $\ell$ from (4) has the form

[TABLE]

and Assumptions 2.1, 3.1 and 3.3 are fulfilled for (24), and it holds with the fourth cumulant $\kappa_{4}(\varepsilon_{0})$ of $\varepsilon_{0}$ that

[TABLE]

If additionally Assumption 3.7 (1) holds and

(d)

$A_{\theta}(\lambda)$ * is $l+1$ -times continuously differentiable in $\theta$ and fulfills component-wise $\nabla^{i}A_{\theta}(\cdot)\in\Sigma(\beta_{A},L_{A})$ $(i=0,...,l+1$ ).*

then Assumption 3.7 is fulfilled and the bias term has the form

[TABLE]

where $w(u):=V(\theta_{0}(u))^{-1}\tilde{V}(\theta_{0}(u))[\partial_{u}\theta_{0}(u),\partial_{u}\theta_{0}(u)]$ and $\tilde{V}(\theta)\in\mathbb{R}^{p\times p\times p}$ is defined via $\tilde{V}(\theta)_{ijk}:=\frac{1}{4\pi}\int_{-\pi}^{\pi}\frac{\nabla^{2}_{ij}f_{\theta}(\lambda)}{f_{\theta}(\lambda)}\cdot\frac{\nabla_{k}f_{\theta}(\lambda)}{f_{\theta}(\lambda)}\ \mbox{d}\lambda$ .

Proof of Proposition D.1.

Put $\tilde{X}_{t}(\theta)=\sum_{k=0}^{\infty}a_{\theta}(k)\varepsilon_{t-k}$ . By (95), we have for all $q\geq 1$ that

[TABLE]

It holds that $a_{\theta}(k)=\frac{1}{2\pi}\int_{-\pi}^{\pi}A_{\theta}(\lambda)e^{-i\lambda k}\ \mbox{d}\lambda$ . By condition (b) and Katznelson (2004), chapter I, section 4, we have that $\sup_{\theta\in\Theta}|\nabla^{i}a_{\theta}(k)|_{\infty}=O(k^{-(3+\eta)})$ with some $\eta>0$ for $i=0,1,2,3$ .

[TABLE]

Furthermore we obtain $\delta_{q}^{\tilde{X}(\theta)}(k)=|a_{\theta}(k)|\cdot\|\varepsilon_{0}-\varepsilon_{0}^{*}\|_{q}=O(k^{-(3+\eta)})$ . Since $|A_{\theta}(\lambda)|\geq\delta_{A}>0$ , it follows from the inverse Fourier transform that the process $\tilde{X}_{t}(\theta)$ is invertible in the sense that

[TABLE]

since

[TABLE]

This shows $\mathbb{E}[\tilde{X}_{t}(\theta)|\mathcal{F}_{t-1}]=-\frac{1}{\gamma_{\theta}(0)}\sum_{k=1}^{\infty}\gamma_{\theta}(k)\tilde{X}_{t-1}(\theta)$ and $\mbox{Var}(\tilde{X}_{t}(\theta)|\mathcal{F}_{t-1})=\frac{1}{\gamma_{\theta}(0)^{2}}$ , thus the negative logarithm of the Gaussian conditional likelihood (4) has the form (96). From the conditions on $A_{\theta}(\lambda)$ in (b) it is straightforward to see that $\ell$ satisfies Assumption 3.3 (5).

Furthermore, we have

[TABLE]

since $\gamma_{\theta}(0)a_{\theta}(0)=1$ with the same Fourier argument as before. From (98) it is immediate that $\nabla\ell(\tilde{Y}_{t}(\theta^{\prime}),\theta)\big{|}_{\theta^{\prime}=\theta}$ is a martingale difference sequence w.r.t. $\mathcal{F}_{t}$ . Using Bochner’s theorem, we have $c_{\theta}(k):=\mathbb{E}[\tilde{X}_{t}(\theta)\tilde{X}_{t-k}(\theta)]=\int_{-\pi}^{\pi}f_{\theta}(\lambda)e^{i\lambda k}\ \mbox{d}\lambda$ and thus

[TABLE]

Furthermore, Kolmogorov’s formula (cf. Brockwell and Davis (2009), Theorem 5.8.1) implies that $\frac{1}{2}\log(\frac{1}{2\pi\gamma_{\theta}(0)^{2}})=\frac{1}{4\pi}\int_{-\pi}^{\pi}\log f_{\theta}(\lambda)\ \mbox{d}\lambda$ , which shows that

[TABLE]

Since $\log(x)\leq x-1$ with equality if and only if $x=1$ and since condition (a) holds, $\theta_{0}(u)$ is the unique minimizer of $\theta\mapsto L(u,\theta)$ . Differentiating (100) twice with respect to $\theta$ (and replacing $\theta_{0}(u)$ by $\theta$ afterwards), we obtain

[TABLE]

Condition (c) implies that the minimal eigenvalue of $V(\theta)$ is bounded away from 0 uniformly in $\theta\in\Theta$ . Define $G_{t}(\theta):=\sum_{k=0}^{\infty}\nabla\gamma_{\theta}(k)\tilde{X}_{t-k}(\theta)-\nabla\gamma_{\theta}(0)a_{\theta}(0)\varepsilon_{t}$ which is $\mathcal{F}_{t-1}$ -measurable and $\mathbb{E}G_{t}(\theta)=0$ . Recall $\gamma_{\theta}(0)a_{\theta}(0)=1$ . From (99) it follows that

[TABLE]

where

[TABLE]

thus

[TABLE]

where $\kappa_{4}(\varepsilon_{0})=\mathbb{E}\varepsilon_{0}^{4}-3$ denotes the fourth cumulant of $\varepsilon_{0}$ .

Now let $\theta_{0}(u)$ be twice continuously differentiable. Note that

[TABLE]

which implies Assumption 3.4.

Define $\tilde{X}_{t}(\theta)=\sum_{k=0}^{\infty}a_{\theta}(k)\varepsilon_{t-k}$ . Then $\nabla\tilde{X}_{t}(\theta)=\sum_{k=0}^{\infty}\nabla a_{\theta}(k)\varepsilon_{t-k}$ , $\nabla^{2}\tilde{X}_{t}(\theta)=\sum_{k=0}^{\infty}\nabla^{2}a_{\theta}(k)\varepsilon_{t-k}$ . We have

[TABLE]

Thus we obtain the following decomposition of the bias term

[TABLE]

∎

Proposition D.2 (Recursively defined time series).

Assume that $\varepsilon_{t}$ , $t\in\mathbb{Z}$ are i.i.d. and have all moments with $\mathbb{E}\varepsilon_{0}=0$ and $\mathbb{E}\varepsilon_{0}^{2}=1$ . Suppose that Assumption 3.3 (1) is fulfilled. Assume that $X_{t,n}$ fulfills

[TABLE]

Assume that $\mu,\sigma:\mathbb{R}^{r}\times\Theta\to\mathbb{R}$ satisfy

[TABLE]

for all $q\geq 1$ with some $\chi\in\mathbb{R}^{r}_{\geq 0}$ with $|\chi|_{1}<1$ . Assume that $\sigma(\cdot)\geq\sigma_{0}$ with some constant $\sigma_{0}>0$ . Assume that $\nabla\sigma\not=0$ , and

(a)

For $g\in\{\nabla^{i}\mu,\nabla^{i}\sigma:i=0,1,2,3\}$ , it holds $\sup_{y}\sup_{\theta\not=\theta^{\prime}}\frac{|g(y,\theta)-g(y,\theta^{\prime})|}{|\theta-\theta^{\prime}|_{1}\cdot(1+|y|_{1})}<\infty$ and $\sup_{\theta\in\Theta}\sup_{y\not=y^{\prime}}\frac{|g(y,\theta)-g(y,\theta^{\prime})|}{|y-y^{\prime}|_{1}}<\infty$ for each component. 2. (b)

for fixed $u\in[0,1]$ , the two conditions $\mu(\tilde{Y}_{0}(\theta_{0}(u)),\theta)=\mu(\tilde{Y}_{0}(\theta_{0}(u)),\theta_{0}(u))$ a.s. and $\sigma(\tilde{Y}_{0}(\theta_{0}(u)),\theta)=\sigma(\tilde{Y}_{0}(\theta_{0}(u)),\theta_{0}(u))$ a.s. imply $\theta=\theta_{0}(u)$ . 3. (c)

uniformly in $\theta\in\Theta$ , the smallest eigenvalues of the matrices $W_{\mu}(\theta):=\mathbb{E}[\frac{\nabla\mu(\tilde{Y}_{0}(\theta^{\prime}),\theta)\nabla\mu(\tilde{Y}_{0}(\theta^{\prime}),\theta)^{\prime}}{\sigma^{2}(\tilde{Y}_{0}(\theta),\theta)}]\big{|}_{\theta^{\prime}=\theta}$ and $W_{\sigma}(\theta):=\mathbb{E}[\frac{\nabla\sigma(\tilde{Y}_{0}(\theta^{\prime}),\theta)\nabla\sigma(\tilde{Y}_{0}(\theta^{\prime}),\theta)^{\prime}}{\sigma^{2}(\tilde{Y}_{0}(\theta),\theta)}]\big{|}_{\theta^{\prime}=\theta}$ are bounded from below by some $\tilde{\lambda}_{0}>0$ .

Then Assumptions 2.1, 3.1 and 3.3 are fulfilled for $\ell$ chosen as in (4) assuming that $\varepsilon_{t}$ is standard Gaussian distributed, i.e. for

[TABLE]

In this case, it holds with $W_{\mu\sigma}:=\mathbb{E}[\frac{\nabla\mu(\tilde{Y}_{0}(\theta^{\prime}),\theta)\nabla\sigma(\tilde{Y}_{0}(\theta^{\prime}),\theta)^{\prime}}{\sigma^{2}(\tilde{Y}_{0}(\theta),\theta)}]\big{|}_{\theta^{\prime}=\theta}$ that

[TABLE]

Proof of Proposition D.2.

Condition (102) implies that $X_{t,n}$ exists and is a.s. unique (cf. Proposition 4.3 in Dahlhaus, Richter and Wu (2017) or Theorem 5.1 in Shao and Wu (2007)) and that there exist $C_{\rho}>0$ , $0<\rho<1$ such that $\sup_{n\in\mathbb{N}}\sup_{t=1,...,n}\delta_{q}^{X_{\cdot,n}}(k),\sup_{\theta\in\Theta}\delta_{q}^{\tilde{X}(\theta)}(k)\leq C\rho^{k}$ for all $k\geq 0$ . With slight changes in the argumentation of their proof, (1) and $D_{q}:=\max\{\sup_{\theta\in\Theta}\|\tilde{X}_{0}(\theta)\|_{q},\sup_{n\in\mathbb{N}}\sup_{t=1,...,n}\|X_{t,n}\|_{q}\}<\infty$ follows similar as Proposition 4.3 and Lemma 4.4 in Dahlhaus, Richter and Wu (2017). Thus, the condition of Assumption 2.1 and 3.1 are fulfilled.

Put $L(u,\theta)=\mathbb{E}\ell(\tilde{Y}_{0}(u),\theta)$ . Let us omit the argument $\tilde{Y}_{0}(u)$ of $\mu(\cdot,\theta)$ and $\sigma(\cdot,\theta)$ in the following. It holds that

[TABLE]

By a Taylor expansion of $x\mapsto x-\log(x)-1$ , we obtain that the second summand is lower bounded by $\frac{1}{4}\mathbb{E}\Big{[}\frac{(\sigma^{2}(\theta_{0}(u))-\sigma^{2}(\theta))^{2}}{(\sigma^{2}(\theta_{0}(u))-\sigma^{2}(\theta))^{2}+\Sigma^{4}(\theta)}\Big{]}$ . By these inequalities, condition (b) shows that the unique minimizer of $\theta\mapsto L(u,\theta)$ is given by $\theta_{0}(u)$ .

Omitting the arguments, $(z,\theta)$ (where $z=(x,y)$ ), we have

[TABLE]

Since $\frac{\tilde{X}_{t}(\theta)-\mu(\tilde{Y}_{t-1}(\theta),\theta)}{\sigma(\tilde{Y}_{t-1}(\theta),\theta)}=\varepsilon_{t}$ and $\mathbb{E}\varepsilon_{t}=0$ , $\mathbb{E}\varepsilon_{0}^{2}=1$ , we obtain that $\nabla\ell(\tilde{Y}_{t}(\theta),\theta)$ is a martingale difference sequence with respect to $\mathcal{F}_{t}$ in each component. Furthermore,

[TABLE]

whose smallest eigenvalue is bounded from below by condition (c). By condition (a) and the fact that $\sigma(\cdot)\geq\sigma_{0}>0$ , straightforward calculations show that Assumption 3.3 (5) is fulfilled with the truncated (and thus summable) sequence $\chi_{i}=\mathbbm{1}_{\{1\leq i\leq r\}}$ and some $M\geq 2$ . ∎

We do not want to go into details when Assumption 3.7 is fulfilled in the situation of Proposition D.2. Regarding the results in Section 4 in Dahlhaus, Richter and Wu (2017) we need additionally that $\mu,\sigma$ are four times continuously differentiable and fulfill some moment conditions.

Appendix E The bias in the tvAR(1) model

We use the results from Proposition D.1. Assume that for $t\in\mathbb{Z}$ , $\tilde{X}_{t}(\theta)=\alpha\cdot\tilde{X}_{t-1}(\theta)+\sigma\varepsilon_{t}$ . Here it holds that $V(\theta)=\text{diag}(\frac{\sigma^{2}}{1-\alpha^{2}},\frac{2}{\sigma^{2}})$ , where $\theta=(\alpha,\sigma)^{\prime}$ . Here, it holds that $f_{\theta}(\lambda)=\frac{\sigma^{2}}{2\pi}|1-\alpha e^{i\lambda}|^{-2}$ and thus

[TABLE]

Moreover, the variance and the first covariance take the form $c_{\theta}(0):=\int_{-\pi}^{\pi}f_{\theta}(\lambda)\ \mbox{d}\lambda=\frac{\sigma^{2}}{1-\alpha^{2}}$ and $c_{\theta}(1):=\int_{-\pi}^{\pi}e^{i\lambda}f_{\theta}(\lambda)\ \mbox{d}\lambda=\frac{\sigma^{2}\alpha}{1-\alpha^{2}}$ . This leads to (let $\bar{\nabla}$ denote the derivative with respect to $\bar{\theta}$ )

[TABLE]

The derivatives of the covariances read

[TABLE]

and

[TABLE]

We obtain

[TABLE]

thus

[TABLE]

where

[TABLE]

By using (97), we finally obtain

[TABLE]

If $\theta=\alpha$ , i.e. $\sigma\equiv\sigma_{0}>0$ is assumed to be constant and known, we obtain

[TABLE]

Appendix F The bias in the tvMA(1) model

We use the results from Proposition D.1. Assume that for $t\in\mathbb{Z}$ , $\tilde{X}_{t}(\theta)=\sigma\varepsilon_{t}+\alpha\sigma\varepsilon_{t-1}$ . Then we have $f_{\theta}(\lambda)=\frac{\sigma^{2}}{2\pi}|1+\alpha e^{i\lambda}|^{2}=\frac{\sigma^{2}}{2\pi}(1+\alpha^{2}+2\alpha\cos(\lambda))$ . Note that $f_{\theta}(\lambda)^{-1}=(\frac{2\pi}{\sigma^{2}})^{2}\cdot f_{\theta}^{AR}(\lambda)$ with $f_{\tilde{\theta}}^{AR}(\lambda)=\frac{\sigma^{2}}{2\pi}|1+\alpha e^{i\lambda}|^{2}$ corresponding to the spectral density of an AR(1) process with parameter $-\alpha$ instead of $\alpha$ . Recall that Kolmogorov’s formula implies $\frac{1}{2}\log(\frac{\sigma^{2}}{2\pi})=\frac{1}{4\pi}\int_{-\pi}^{\pi}\log f_{\theta}^{AR}(\lambda)\ \mbox{d}\lambda$ and thus

[TABLE]

By (101), this leads to the same $V$ as in the AR case:

[TABLE]

To calculate $\tilde{V}(\theta)$ , note that

[TABLE]

We have

[TABLE]

where $c_{\theta}(0)=\frac{\sigma^{2}}{1-\alpha^{2}}$ and $c_{\theta}(1)=-\alpha\cdot c_{\theta}(0)$ . With

[TABLE]

we obtain

[TABLE]

thus

[TABLE]

where

[TABLE]

Using (97), we finally obtain

[TABLE]

F.1 Bias terms in Examples 4.3 and 4.4

Lemma F.1.

In the situation of Example 4.3, it holds that

[TABLE]

where $P(u):=\partial_{u}W(\theta_{0}(u))$ and

[TABLE]

Proof.

Note that $\theta=(\alpha^{\prime},\sigma)^{\prime}$ . It holds that

[TABLE]

The recursion implies

[TABLE]

Using the equations given by the recursion, we end up with

[TABLE]

Furthermore, we obtain

[TABLE]

and thus

[TABLE]

Note that

[TABLE]

We finally obtain the bias-term

[TABLE]

In the special case of the AR(1) process $\tilde{X}_{t}(\theta)=\alpha\cdot\tilde{X}_{t-1}(\theta)+\sigma\varepsilon_{t}$ it is known that $W(\theta)=\mathbb{E}[\tilde{X}_{t}(\theta)^{2}]=\frac{\sigma^{2}}{1-\alpha^{2}}$ and thus $P(u)=\partial_{u}W(\theta_{0}(u))=\frac{2\sigma(u)^{2}}{1-\alpha(u)^{2}}\big{[}\frac{\partial_{u}\sigma(u)}{\sigma(u)}+\frac{\alpha(u)\partial_{u}\alpha(u)}{1-\alpha(u)^{2}}\big{]}=2W(\theta_{0}(u))\cdot\big{[}\frac{\partial_{u}\sigma(u)}{\sigma(u)}+\frac{\alpha(u)\partial_{u}\alpha(u)}{1-\alpha(u)^{2}}\big{]}$ , i.e.

[TABLE]

This leads to

[TABLE]

If $\sigma\equiv\sigma_{0}$ is assumed to be known (and not a parameter, meaning that $\theta=\alpha$ ), then this simplifies to

[TABLE]

∎

Lemma F.2.

In the situation of Example 4.4, it holds that

[TABLE]

where $P(u):=\partial_{u}V(\theta_{0}(u))$ and $w(u):=V(\theta_{0}(u))^{-1}P(u)\partial_{u}\theta_{0}(u)$ .

Proof of Lemma F.2.

Let $A(u,\theta):=\frac{\mu(\tilde{Z}_{t-1}(u))}{2\langle\theta,\mu(\tilde{Z}_{t-1}(u))\rangle^{2}}$ and $B(u,\theta):=\langle\theta,\mu(\tilde{Z}_{t-1}(u))\rangle-\tilde{X}_{t}(u)$ . It holds that $\nabla\ell(\tilde{Y}_{t}(u),\theta)=A(u,\theta)\cdot B(u,\theta)$ . Thus

[TABLE]

Since $B(u,\theta_{0}(u))=\langle\theta_{0}(u),\mu(\tilde{Z}_{t-1}(u))\rangle\cdot(1-\varepsilon_{t}^{2})$ is a martingale difference sequence, the first summand in the above formula will vanish when the expectation ist applied. To keep the formulas short, we will abbreviate $\mu:=\mu(\tilde{Z}_{t-1}(u))$ in the following. It holds that

[TABLE]

Furthermore, we have

[TABLE]

The recursion $\tilde{X}_{t}(u)^{2}=\langle\theta_{0}(u),\mu\rangle\rangle\varepsilon_{t}^{2}$ implies

[TABLE]

Recall that $V(\theta)=\frac{1}{2}\mathbb{E}\big{[}\frac{\mu(\tilde{Z}_{0}(\theta))\cdot\mu(\tilde{Z}_{0}(\theta))^{\prime}}{\langle\theta,\mu(\tilde{Z}_{0}(\theta))\rangle^{2}}\big{]}$ . Using the equations above and $P(u)=\partial_{u}V(\theta_{0}(u))$ , we obtain

[TABLE]

This leads to

[TABLE]

In the special case of the ARCH(1) process $\tilde{X}_{t}(\theta)=\sqrt{\alpha_{1}+\alpha_{2}\tilde{X}_{t-1}(\theta)^{2}}\cdot\varepsilon_{t}$ with $\theta=(\alpha_{1},\alpha_{2})^{\prime}$ , we have

[TABLE]

where $\tilde{X}_{t}(u)^{2}$ , $\partial_{u}\tilde{X}_{t}(u)^{2}$ can be obtained by using the recursion formulas $\tilde{X}_{t}(u)^{2}=\big{(}\alpha_{1}(u)+\alpha_{2}(u)\tilde{X}_{t-1}(u)^{2}\big{)}\varepsilon_{t}^{2}$ and

[TABLE]

∎

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arkoun and Pergamenchtchikov (2008) Arkoun, O., and Pergamenchtchikov, S. (2008). Nonparametric estimation for an autoregressive model. Bulletin of Tomsk State University. Mathematics and Mechanics , 2(3).
2Arkoun (2011) Arkoun, O. (2011). Sequential adaptive estimators in nonparametric autoregressive models. Sequential analysis , 30(2), 229-247.
3Brockwell and Davis (2009) Brockwell, P. J., and Davis, R. A. (2009). Time Series: Theory and Methods , Springer Series in Statistics.
4Chiu (1991) Chiu, S. T. (1991). Bandwidth selection for kernel density estimation. The Annals of Statistics , 19(4), 1883-1905.
5Dahlhaus (1997) Dahlhaus, R. (1997). Fitting time series models to nonstationary processes. The Annals of Statistics 25(1), 1-37.
6Dahlhaus and Giraitis (1998) Dahlhaus, R., and Giraitis, L. (1998). On the optimal segment length for parameter estimates for locally stationary time series. Journal of Time Series Analysis 19(6), 629-655.
7Dahlhaus, Neumann and von Sachs (1999) Dahlhaus, R., Neumann, M. H., and von Sachs, R. (1999). Nonlinear wavelet estimation of time-varying autoregressive processes. Bernoulli , 5(5), 873-906.
8Dahlhaus (2000) Dahlhaus, R. (2000). A likelihood approximation for locally stationary processes. The Annals of Statistics 28(6), 1762–1794.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Cross validation for locally stationary processes

Abstract

1 Introduction

2 A cross validation method for locally stationary processes

2.1 The Model

Assumption 2.1** (Locally stationary time series model).**

Remark 2.2**.**

Example 2.3**.**

2.2 Distance measures

2.3 The crossvalidation method

3 Main results

Assumption 3.1** (Dependence assumption).**

Definition 3.2** (The class L(M,χ,C)\mathcal{L}(M,\chi,C)L(M,χ,C)).**

Assumption 3.3**.**

Assumption 3.4**.**

Remark 3.5**.**

Theorem 3.6** (Asymptotic optimality of cross validation).**

Assumption 3.7** (Bias expansion conditions).**

Theorem 3.8** (Approximation of distance measures).**

Theorem 3.9** (Consistency of the cross validation bandwidth).**

3.1 Proofs

Lemma 3.10**.**

Corollary 3.11**.**

Lemma 3.12**.**

Lemma 3.13**.**

Proof of Theorem 3.6.

Proof of Theorem 3.8.

Proof of Theorem 3.9.

4 Examples and Simulations

4.1 Examples

Example 4.1** (tvARMA(r,sr,sr,s) process).**

Remark 4.2** (tvAR(rrr) processes).**

Example 4.3** (Conditional mean processes).**

Example 4.4** (Conditional variance processes).**

5 Concluding remarks

6 Proofs

6.1 Standard approximations

Lemma 6.1** (The stationary crop approximation).**

Lemma 6.2** (Weak bias approximation).**

Proof of Lemma 6.2:.

6.2 The bias-variance decomposition of ∇Ln,h(u,θ0(u))\nabla L_{n,h}(u,\theta_{0}(u))∇Ln,h​(u,θ0​(u))

Lemma 6.3** (Expansion of expectations).**

Proposition 6.4** (Bias-Variance decomposition of ∇Ln,h(u,θ0(u))\nabla L_{n,h}(u,\theta_{0}(u))∇Ln,h​(u,θ0​(u))).**

6.3 Uniform convergence results and moment inequalities for the local likelihood Ln,h(u,θ)L_{n,h}(u,\theta)Ln,h​(u,θ)

Lemma 6.5** (Moment inequality).**

Proof of Lemma 6.5.

Lemma 6.6** (Continuity properties of localized sums).**

Proof.

Lemma 6.7** (Uniform convergence and weak bias expansion).**

Proof of Lemma 6.7.

Lemma 6.8**.**

Proof of Lemma 6.8:.

Corollary 6.9** (Uniform convergence of likelihoods).**

Theorem 6.10** (Uniform strong consistency of the maximum likelihood estimator).**

6.4 Proofs of the results of Chapter 3

Proof of Lemma 3.10.

Proof of Corollary 3.11.

Proof of Lemma 3.12.

Proof of Lemma 3.13.

Lemma 6.11**.**

Appendix A More detailed proofs of section 3

Proof of Corollary 3.11, more detailed.

Appendix B Additional proofs of section 6

Proof of Lemma 6.1.

Proof of Lemma 6.3:.

Proof of Proposition 6.4.

Proof of Theorem 6.10.

Proof of Lemma 3.10 (additional material).

Proof of Lemma 3.12 (additional material).

Proof of Lemma 3.13 (additional material).

Proof of Lemma 6.11:.

Appendix C Bounds for moments of sums, quadratic and cubic forms of Bernoulli shifts

Lemma C.1**.**

Proof.

Assumption 2.1 (Locally stationary time series model).

Remark 2.2.

Example 2.3.

Assumption 3.1 (Dependence assumption).

Definition 3.2 (The class $\mathcal{L}(M,\chi,C)$ ).

Assumption 3.3.

Assumption 3.4.

Remark 3.5.

Theorem 3.6 (Asymptotic optimality of cross validation).

Assumption 3.7 (Bias expansion conditions).

Theorem 3.8 (Approximation of distance measures).

Theorem 3.9 (Consistency of the cross validation bandwidth).

Lemma 3.10.

Corollary 3.11.

Lemma 3.12.

Lemma 3.13.

Example 4.1 (tvARMA( $r,s$ ) process).

Remark 4.2 (tvAR( $r$ ) processes).

Example 4.3 (Conditional mean processes).

Example 4.4 (Conditional variance processes).

Lemma 6.1 (The stationary crop approximation).

Lemma 6.2 (Weak bias approximation).

6.2 The bias-variance decomposition of $\nabla L_{n,h}(u,\theta_{0}(u))$

Lemma 6.3 (Expansion of expectations).

Proposition 6.4 (Bias-Variance decomposition of $\nabla L_{n,h}(u,\theta_{0}(u))$ ).

6.3 Uniform convergence results and moment inequalities for the local likelihood $L_{n,h}(u,\theta)$

Lemma 6.5 (Moment inequality).

Lemma 6.6 (Continuity properties of localized sums).

Lemma 6.7 (Uniform convergence and weak bias expansion).

Lemma 6.8.

Corollary 6.9 (Uniform convergence of likelihoods).

Theorem 6.10 (Uniform strong consistency of the maximum likelihood estimator).

Lemma 6.11.

Lemma C.1.

Proposition D.1 (Linear time series models).

Proposition D.2 (Recursively defined time series).

Lemma F.1.

Lemma F.2.