Asymptotic properties of a componentwise ARH(1) plug-in predictor
J. \'Alvarez-Li\'ebana, D. Bosq, M. Dolores Ruiz-Medina

TL;DR
This paper develops and proves the consistency of a componentwise estimator and predictor for ARH(1) processes in Hilbert spaces, supported by simulations comparing its performance to existing methods.
Contribution
It introduces a new componentwise estimation method for the autocorrelation operator in ARH(1) processes with known eigenvectors, proving its convergence and consistency.
Findings
Estimator converges in mean-square to the true autocorrelation operator.
The predictor shows mean absolute convergence to the conditional expectation.
Simulation results demonstrate the estimator's finite-sample effectiveness and compare favorably with existing methods.
Abstract
This paper presents new results on prediction of linear processes in function spaces. The autoregressive Hilbertian process framework of order one (ARH(1) process framework) is adopted. A componentwise estimator of the autocorrelation operator is formulated, from the moment-based estimation of its diagonal coefficients, with respect to the orthogonal eigenvectors of the auto-covariance operator, which are assumed to be known. Mean-square convergence to the theoretical autocorrelation operator, in the space of Hilbert-Schmidt operators, is proved. Consistency then follows in that space. For the associated ARH(1) plug-in predictor, mean absolute convergence to the corresponding conditional expectation, in the considered Hilbert space, is obtained. Hence, consistency in that space also holds. A simulation study is undertaken to illustrate the finite-large sample behavior of the formulated…
| Our Approach | Bosq (2000) | Guillas (2001) | ||
|---|---|---|---|---|
| Our Approach | Bosq (2000) | Guillas (2001) | ||
|---|---|---|---|---|
| Our approach | Bosq (2000) | Guillas (2001) | ||
|---|---|---|---|---|
| Our approach | Bosq (2000) | Guillas (2001) | ||
|---|---|---|---|---|
| Our approach | Bosq (2000) | Guillas (2001) | ||
|---|---|---|---|---|
| Our approach | Bosq (2000) | Guillas (2001) | ||
|---|---|---|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Asymptotic properties of a componentwise ARH(1) plug-in predictor
Javier Álvarez-Liébana1, Denis Bosq2 and M. Dolores Ruiz–Medina1
Summary
This paper presents new results on prediction of linear processes in function spaces. The autoregressive Hilbertian process framework of order one (ARH(1) process framework) is adopted. A componentwise estimator of the autocorrelation operator is formulated, from the moment–based estimation of its diagonal coefficients, with respect to the orthogonal eigenvectors of the auto-covariance operator, which are assumed to be known. Mean-square convergence to the theoretical autocorrelation operator, in the space of Hilbert-Schmidt operators, is proved. Consistency then follows in that space. For the associated ARH(1) plug-in predictor, mean absolute convergence to the corresponding conditional expectation, in the considered Hilbert space, is obtained. Hence, consistency in that space also holds. A simulation study is undertaken to illustrate the finite-large sample behavior of the formulated componentwise estimator and predictor. The performance of the presented approach is compared with alternative approaches in the previous and current ARH(1) framework literature, including the case of unknown eigenvectors.
Journal of Multivariate Analysis, 155, pp. 12-34. DOI: doi.org/10.1016/j.jmva.2016.11.009
1 Department of Statistics and O. R., University of Granada, Spain. 2 LSTA, Université Pierre et Marie Curie–Paris 6, Paris, France.
E-mail: [email protected]
Key words: ARH(1) processes; consistency; functional prediction; mean absolute and quadratic convergence.
1 Introduction
In the last few decades, an extensive literature on statistical inference from functional random variables has emerged. This work was motivated in part by the statistical analysis of high–dimensional data, as well as data of a continuous (infinite-dimensional) nature; see, e.g., Bosq [2000, 2007], Dedecker and Merlevède [2003], Ferraty and Vieu [2006], Merlevède [1996b, 1997], Ramsay and Silverman [2005], Ruiz-Medina [2012]. New developments in functional data analysis are described, e.g., in Bongiorno et al. [2014], Cuevas [2014], Horváth and Kokoszka [2012], Hsing and Eubank [2015], and in a recent Special Issue of this journal Goia and Vieu [2016].
These references include a nice summary on the statistics theory for functional data, contemplating covariance operator theory and eigenfunction expansion, perturbation theory, smoothing and regularization, probability measures on a Hilbert spaces, functional principal component analysis, functional counterparts of the multivariate canonical correlation analysis, the two sample problem and the change point problem, functional linear models, functional test for independence, functional time series theory, spatially distributed curves, software packages and numerical implementation of the statistical procedures discussed, among other topics.
The special case of functional regression models, in which the predictor is a random function and the response is scalar, has been particularly well studied. Various specifications of the functional regression parameter arise in fields such as biology, climatology, chemometrics, and economics. To avoid the computational (high–dimensional) limitations of the nonparametric approach, several parametric and semi–parametric methods have been proposed; see, e.g., Ferraty et al. [2012] and the references therein. In Ferraty et al. [2012], a combination of a spline approximation and the one–dimensional Nadaraya–Watson approach was proposed to avoid high dimensionality issues. Generalizations to the case of more regressors (all functional, or both functional and real) were also addressed in the nonparametric, semi–parametric, and parametric frameworks; for an overview, see Aneiros-Pérez and Vieu [2006], Febrero-Bande and González-Manteiga [2013], Ferraty and Vieu [2009].
In the nonparametric regression framework, the case where the covariates and the response are functional was considered by Ferraty et al. [2012], where a functional version of the Nadaraya–Watson estimator was proposed for the estimation of the regression operator and shown to be point–wise asymptotically normal. Resampling techniques were used to overcome the difficulties arising in the estimation of the asymptotic bias and variance. Semi–functional partial linear regression, introduced in Aneiros-Pérez and Vieu [2008], allows the prediction of a real-valued random variable from a set of real–valued explanatory variables, and a time–dependent functional explanatory variable. Motivated by genetic and environmental applications, a semi–parametric maximum likelihood method for the estimation of odds ratio association parameters was developed by Chen et al. [2012] in a high–dimensional data context.
In the autoregressive Hilbertian time series framework, several estimation and prediction procedures have been proposed and studied. Mas [1999] established, under suitable conditions, the asymptotic normal distribution of the formulated estimator of the autocorrelation operator, based on projection into the theoretical eigenvectors. In Bosq [2000], Bosq and Blanke [2007], the problem of prediction of linear processes in function spaces was addressed. In particular, sufficient conditions for the consistency of the empirical autocovariance and cross–covariance operators were obtained. The asymptotic normal distribution of the empirical autocovariance operator was also derived. Moreover, the asymptotic properties of the empirical eigenvalues and eigenvectors were analysed.
Guillas [2001] established the efficiency of a componentwise estimator of the autocorrelation operator, based on projection into the empirical eigenvector system of the autocovariance operator. Consistency, in the space of bounded linear operators, of the formulated estimator of the autocorrelation operator, and of its associated ARH(1) plug–in predictor was later proved by Mas [2004]. He derived sufficient conditions for the weak convergence of the ARH(1) plug–in predictor to a Hilbert–valued Gaussian random variable (see Mas [2007]). Simultaneously, Mas and Menneteau [2003a] obtained high deflection results or large and moderate deviations for infinite–dimensional autoregressive processes. Furthermore, the law of the iterated logarithm for the covariance operator estimator was formulated by Menneteau [2005].
The main properties of the class of autoregressive Hilbertian processes with random coefficients were investigated by Mourid [2004]. Kargin and Onatski [2008] gave interesting extensions of the autoregressive Hilbertian framework, based on the spectral decomposition of the autocorrelation operator, and not of the autocovariance operator. The first generalization on autoregressive processes of order greater than one was proposed by Mourid [1993], in order to improve prediction. ARHX(1) models; i.e., autoregressive Hilbertian processes with exogenous variables were studied by Damon and Guillas [2002, 2005]. In Guillas [2000, 2001] a doubly stochastic formulation of the autoregressive Hilbertian process was investigated. The ARHD model was introduced by Marion and Pumo [2004], taking into account the regularity of trajectories through the derivatives. The conditional autoregressive Hilbertian process (CARH process) was considered by Cugliari [2011], developing parallel projection estimation methods to predict such processes. In the Banach–valued context, we refer to the papers by Bensmain and Mourid [2001], Dehling and Sharipov [2005], Pumo [1992, 1998], among others.
In this paper, we assume that the autocorrelation operator belongs to the Hilbert–Schmidt class, and admits a diagonal spectral decomposition in terms of the orthogonal eigenvector system of the autocovariance operator. Such is the case, e.g., of an autocorrelation operator defined as a continuous function of the autocovariance operator. A componentwise estimator of the autocorrelation operator is then constructed in terms of the eigenvectors of the autocovariance operator, which are assumed to be known. This occurs when the random initial condition is defined as the solution, in the mean–square sense, of a stochastic differential equation driven by white noise. Beyond this case, the sparse representation and whitening properties of wavelet bases can be exploited to obtain a diagonal representation of the autocovariance and cross–covariance operators, in terms of a common and known wavelet basis. Unconditional bases, like wavelet bases, also allow the diagonal spectral series representation of the distributional kernels of Calderón-Zygmund operators.
Under the assumptions stated in Appendices 2–4, we establish the convergence in the -sense of a componentwise estimator of the autocorrelation operator in the space of Hilbert–Schmidt operators i.e., is derived. Consistency then follows in . Under the same conditions, consistency in H of the associated ARH(1) plug–in predictor is obtained, from its convergence in the -sense in the Hilbert space i.e., in the space . The Gaussian framework is analysed in Appendix 4 and illustrated in Appendix 5, where examples show the behaviour of the proposed componentwise autocorrelation operator estimator, and associated predictor, for large sample sizes. We also present there a comparative study with alternative ARH(1) prediction techniques, including componentwise parameter estimation of the autocorrelation operator, from known and unknown eigenvectors, as well as kernel (nonparametric) functional estimation, and penalized, spline and wavelet, estimation. Final comments on the application of the proposed approach from real data are provided in Appendix 6.
2 Preliminaries
This section contains the preliminary definitions and lemmas that will be used to derive the main results of this paper. In the following, denotes a real separable Hilbert space. Recall that, from Bosq [2000], a zero–mean ARH(1) process satisfies, for all , the equation
[TABLE]
where denotes the autocorrelation operator of the process which belongs to the space of bounded linear operators, such that for all integers beyond a certain , with denoting the norm in the space The Hilbert–valued innovation process is assumed to be a strong–white noise which is uncorrelated with the random initial condition. That is, is a Hilbert–valued zero–mean stationary process, with independent and identically distributed components in time, with for all We restrict our attention here to the case where is such that
[TABLE]
The following assumptions are made.
Assumption A1. The autocovariance operator
[TABLE]
is a positive, self–adjoint and trace operator. As a result, it admits the following diagonal spectral representation
[TABLE]
in terms of an orthonormal system of eigenvectors which are known. Here,
[TABLE]
denote the real positive eigenvalues of arranged in decreasing order of magnitude and
[TABLE]
Assumption A2. The autocorrelation operator is a self–adjoint and Hilbert–Schmidt operator, admitting the diagonal spectral decomposition
[TABLE]
where is the system of eigenvalues of the autocorrelation operator with respect to the orthonormal system of eigenvectors of the autocovariance operator .
Note that, under Assumption A2,
[TABLE]
Remark 1
Assumption A2* holds, in particular, when operator is defined as a continuous function of operator C (see [Dautray and Lions, 1990, pp. 119–140] and Remark 4).*
In the following, for any let
[TABLE]
be the cross–covariance operator of the ARH(1) process .
Remark 2
Under Assumptions A1–A2, it follows from equation (1) that
[TABLE]
By projecting equation (1) into the orthonormal system , we also have, for each and all , the AR(1) equation
[TABLE]
where and for all . From equation (2), we have, for each and all ,
[TABLE]
where
[TABLE]
given that, for all ,
[TABLE]
Let us now consider the Banach space of the equivalence classes of the space of zero–mean second–order Hilbert–valued random variables (–valued random variables) with finite seminorm given by
[TABLE]
That is, for and belong to the same equivalence class if and only if
[TABLE]
The convergence in the seminorm of will be considered in Proposition 1, where denotes the Hilbert space of Hilbert–Schmidt operators on a Hilbert space .
For each let us consider the following biorthogonal representation of the functional value of the ARH(1) process , and of the functional value of its innovation process:
[TABLE]
where
[TABLE]
Here, under Assumptions A1–A2, for
[TABLE]
where, as before, denotes the system of eigenvectors of the autocovariance operator and
[TABLE]
for all
The following lemma provides the convergence, in the seminorm of of the series expansions (5)–(6).
Lemma 1
Let be a zero–mean ARH(1) process. Under Assumptions A1–A2, for any the following limit holds
[TABLE]
where . Furthermore,
[TABLE]
Similar assertions hold for the biorthogonal series representation
[TABLE]
Proof.
Under Assumption A1, from the trace property of the sequence
[TABLE]
satisfies, for sufficiently large, and arbitrary,
[TABLE]
since, under Assumption A1, . Hence, is a Cauchy sequence. Thus,
[TABLE]
for arbitrary. From equation (7),
[TABLE]
is also a Cauchy sequence in Thus, the sequence has finite limit in , for all .
Furthermore,
[TABLE]
In the derivation of the identities in (7)–(LABEL:A3:19), we have applied that, for every
[TABLE]
Moreover, from identities in (LABEL:A3:20),
[TABLE]
In a similar way, we can derive the convergence to in of the series for every since is assumed to be strong–white noise, and hence, its covariance operator is in the trace class. We can also obtain an analogous to equation (10).
In equations (5)–(6), for every
[TABLE]
Note that, from Assumption A2 for each in equation (2) defines a stationary and invertible AR(1) process. In addition, from equations (5) and (LABEL:A3:20), for every and
[TABLE]
which implies that
[TABLE]
In particular, we obtain, for each and for every
[TABLE]
Remark 3
From equation (2) and Lemma 1, keeping in mind that
[TABLE]
the following invertible and stationary AR(1) process can be defined:
[TABLE]
where, for each and are respectively introduced in equations (5)-(6). In the following, for each we assume that
[TABLE]
to ensure ergodicity for all second–order moments, in the mean–square sense; see, e.g., [Hamilton, 1994, pp. 192–193].
Furthermore,
[TABLE]
Remark 4
In particular, Assumption A2 holds if the following orthogonality condition is satisfied, for all and ,
[TABLE]
where denotes the Kronecker Delta function. In practice, unconditional bases, e.g., wavelet bases, lead to a sparse representation for functional data; see, e.g., Nason [2008], Ogden [1997], Vidakovic [1998] for statistically-oriented treatments. Wavelet bases are also designed for sparse representation of kernels defining integral operators, in spaces with respect to a suitable measure (see Mallat [2009]). The Discrete Wavelet Transform (DWT) approximately decorrelates or whitens data (see Vidakovic [1998]). In particular, operators and could admit an almost diagonal representation with respect to the self-tensorial tensorial product of a suitable wavelet basis.
3 Estimation and prediction results
A componentwise estimator of the autocorrelation operator and of the associated ARH(1) plug–in predictor are formulated in this section. Their convergence to the corresponding theoretical functional values are derived in the spaces and respectively. Their consistency in the spaces and then follows.
From equation (3), for each and for a given sample size , one can consider the usual respective moment–based estimators and of and in the AR(1) framework, given by
[TABLE]
The following truncated componentwise estimator of is then formulated:
[TABLE]
where, for each
[TABLE]
Here, the truncation parameter indicates that we have considered the first eigenvectors associated with the first eigenvalues, arranged in decreasing order of their modulus magnitude. Furthermore, is such that
[TABLE]
The following additional condition will be assumed on for the derivation of the subsequent results:
Assumption A3. The truncation parameter in (15) is such that
[TABLE]
Remark 5
Assumption A3* has also been considered in [Bosq, 2000, p. 217], to ensure weak consistency of the proposed estimator of as well as, in [Mas, 1999, Proposition 4], in the derivation of asymptotic normality.*
From Remark 3, for each in equation (14) defines a stationary and invertible AR(1) process, ergodic in the mean–square sense; see, e.g., Bartlett [1946]. Therefore, in view of equations (11) and (13), for each , there exist two positive constants and such that the following identities hold:
[TABLE]
Equations (18)-(19) imply, for sufficiently large,
[TABLE]
for certain positive constants and for each Equivalently, for sufficiently large,
[TABLE]
The following assumption is now considered.
Assumption A4. We assume that
[TABLE]
Remark 6
From equation (16), applying the Cauchy–Schwarz’s inequality, we obtain, for each ,
[TABLE]
Convergence in
Next, the convergence of to in the space is derived under the setting of conditions formulated in the previous sections.
Proposition 1
Let be a zero–mean standard ARH(1) process. Under Assumptions A1–A4, the following limit holds:
[TABLE]
Specifically,
[TABLE]
Remark 7
[Bosq, 2000, Corollary 4.3]* can be applied to obtain weak convergence results, in terms of weak expectation, using the empirical eigenvectors . See definition of weak expectation at the beginning of [Bosq, 2000, Section 1.3, p. 27]).*
Proof. For each the following almost surely inequality is satisfied:
[TABLE]
Thus, under Assumptions A1–A2, from equation (24), for each
[TABLE]
which implies, for each ,
[TABLE]
Under Assumption A2, from equations (15) and (27),
[TABLE]
Furthermore, from (5) and (16), for each ,
[TABLE]
where, considering equation (4),
[TABLE]
for each Equations (28)–(31) then lead to
[TABLE]
For each and for sufficiently large, considering equations (22)–(23), under Assumption A4,
[TABLE]
From the trace property of operator
[TABLE]
and from the Hilbert–Schmidt property of
[TABLE]
Thus, in view of equations (LABEL:A3:49)–(34),
[TABLE]
where
[TABLE]
Under Assumption A3, equations (LABEL:A3:50)–(36) imply
[TABLE]
as we wanted to prove.
Note that consistency of in the space directly follows from equation (25) in Proposition 1.
Corollary 1
Let be a zero–mean standard ARH(1) process. Under Assumptions A1–A4, as long as
[TABLE]
where, as usual, denotes the convergence in probability.
Consistency of the ARH(1) plug–in predictor.
Let us consider the space of bounded linear operators on with the norm
[TABLE]
for every In particular, for each
[TABLE]
In the following, we denote by
[TABLE]
as usual, the ARH(1) plug–in predictor of as an estimator of the conditional expectation . The following proposition provides the consistency of in .
Proposition 2
Let be a zero–mean standard ARH(1) process. Under Assumptions A1–A4,
[TABLE]
Specifically,
[TABLE]
In particular,
[TABLE]
where, as usual, denotes the convergence in probability.
Proof.
From (37) and Proposition 1, for sufficiently large, the following almost surely inequality holds:
[TABLE]
where, as given in equation (38), Thus,
[TABLE]
From the Cauchy-Schwarz’s inequality, keeping in mind that, for a Hilbert–Schmidt operator it always holds that we have from equation (39),
[TABLE]
where, as before, (see equation (LABEL:A3:20)).
Since from Proposition 1 (see equation (26)),
[TABLE]
from equation (40), we obtain,
[TABLE]
where , with being given in (36). In particular, under Assumption A3,
[TABLE]
which implies that
[TABLE]
4 The Gaussian case
In this section, we prove that, in the Gaussian ARH(1) context, Assumptions A1–A2 and A4 also hold. From equation (11), for ,
[TABLE]
Furthermore, for each and , the random vector follows a Multivariate Normal distribution with null mean vector, and covariance matrix
[TABLE]
It is well–known (see, for example, Gurland [1956]) that the variance of a quadratic form defined from a multivariate Gaussian vector and a symmetric matrix is given by:
[TABLE]
For each applying equation (42), with in (41), and the identity matrix, keeping in mind , for every ,
[TABLE]
Furthermore, from equation (LABEL:A3:mvqf4), for each
[TABLE]
We then obtain, from equation (44),
[TABLE]
Equation (45) leads to
[TABLE]
Hence, for each in equation (18) is given by
[TABLE]
and, from equation (44),
[TABLE]
Thus, for every in equation (20) satisfies
[TABLE]
Remark 8
Note that, from Lemma 1, for each and ,
[TABLE]
Thus, the assumption considered in Remark 3 holds, and for each the AR(1) process is ergodic for all second–order moments, in the mean–square sense; see [Hamilton, 1994, pp. 192–193].
For and for each we are now going to compute in (19). The random vectors
[TABLE]
are multivariate Normal distributed, with null mean vector, and covariance matrix
[TABLE]
From equation (13), for each
[TABLE]
where
[TABLE]
with, as before, denoting the identity matrix.
However, the variance of
[TABLE]
depends greatly on the distribution of and In the Gaussian case, keeping in mind that
[TABLE]
are zero–mean multivariate Normal distributed vectors with covariance matrix given in (46), and having cross–covariance matrix in (48), we can compute the variance of from (47)–(48), as follows. First,
[TABLE]
This can be rewritten as
[TABLE]
which is equal to
[TABLE]
This then reduces to
[TABLE]
which is the same as
[TABLE]
where, from (48),
[TABLE]
From (LABEL:A3:extversionvv3),
[TABLE]
Therefore, for each
[TABLE]
Thus, for each in (19) is given by From equation (50),
[TABLE]
Hence, for every in equation (21) satisfies
[TABLE]
Therefore, the constant in Assumption A4 is such that
5 Simulation study
A simulation study is undertaken to illustrate the behaviour of the formulated componentwise estimator of the autocorrelation operator, and of its associated ARH(1) plug–in predictor for large sample sizes. The results are reported in Appendix 5.1. In Appendix 5.2, a comparative study is developed, from the implementation of the ARH(1) plug–in prediction techniques proposed in Antoniadis and Sapatinas [2003], Besse et al. [2000], Bosq [2000], Guillas [2001]. In the subsequent sections, we restrict our attention to the Gaussian case
Behaviour of and for large sample sizes
Let be the Dirichlet negative Laplacian operator on given by
[TABLE]
The eigenvectors and eigenvalues of satisfy, for each and for each ,
[TABLE]
For each and , the solution to equation (51) is given by (see [Grebenkov and Nguyen, 2013, p. 6]):
[TABLE]
We consider here the operator defined as
[TABLE]
From [Dautray and Lions, 1990, pp. 119–140], the eigenvectors of coincide with the eigenvectors of and its eigenvalues are given by:
[TABLE]
Additionally, considering
[TABLE]
for certain positive constant close to zero, is a positive self–adjoint Hilbert–Schmidt operator, whose eigenvectors coincide with the eigenvectors of and whose eigenvalues are such that for every and
[TABLE]
where, as before, are given in equation (52).
From (LABEL:A3:25a), the eigenvalues of are then defined, for each as
[TABLE]
Note that is in the trace class, since the trace property of and the fact that for every implies
[TABLE]
For this particular example of operator we have considered truncation parameter of the form
[TABLE]
for a suitable which, in particular, allows verification of (17). From equation (53), one has, for ,
[TABLE]
From equation (55), Assumption A3 is then satisfied if
[TABLE]
since . Fix and . Then, from equation (56), In particular, the values and have been tested, in Table 1 below, for and where denotes the space of square integrable functions on
The computed empirical truncated functional mean square error of the estimator of for a sample size , is given by:
[TABLE]
where denotes the number of simulations, and for each represents the estimator of based on the –th generation of the values with for and
For the plug–in predictor we compute the empirical version of the derived upper bound (40), which, for each is given by
[TABLE]
From realizations, for each one of the elements of the sequence of sample sizes
[TABLE]
the and values, for and are displayed in Table 1, where the abbreviated notations for and for are used (see also Figures 1–2).
In this paper, a one–parameter model of is selected depending on parameter . In [Guillas, 2001, Example 2], in the same spirit, for an equivalent spectral class of operators , a three–parameter model is established for to ensure convergence in quadratic mean in the space of the componentwise estimator of constructed from the known eigenvectors of . The numerical results displayed in Table 1 and Figures 1–2 illustrate the fact that the proposed componentwise estimator presents a speed of convergence to in quadratic mean in faster than which corresponds to the optimal case for the componentwise estimator of proposed in Guillas [2001], in the case of known eigenvectors of ; see, in particular, [Guillas, 2001, Theorem 1, Remark 2 and Example 2]. For larger values of the parameters than and than , a faster velocity of convergence of to in quadratic mean in the space will be obtained. However, larger sample sizes are required for larger values of in order to estimate a given number of coefficients of A more detailed discussion about comparison of the rates of convergence of the ARH(1) plug–in predictors proposed in Antoniadis and Sapatinas [2003], Besse et al. [2000], Bosq [2000], Guillas [2001] can be found in the next section.
A comparative study
In this section, the performance of our approach is compared with those ones given in Antoniadis and Sapatinas [2003], Besse et al. [2000], Bosq [2000], Guillas [2001], including the case of unknown eigenvectors of In the last case, our approach and the approaches presented in Bosq [2000], Guillas [2001] are implemented in terms of the empirical eigenvectors.
5.2.1 Theoretical–eigenvector–based componentwise estimators
Let us first compare the performance of our ARH(1) plug–in predictor, defined in (38), and the ones formulated in Bosq [2000], Guillas [2001], in terms of the theoretical eigenvectors of Note that, in this first part of our comparative study, we consider the previous generated Gaussian ARH(1) process, with autocovariance and autocorrelation operators defined from equations (53) and (54), for different rates of convergence to zero of parameters and with both sequences being summable sequences. Since we restrict our attention to the Gaussian case, conditions A B1 and C formulated in [Bosq, 2000, pp. 211–212] are satisfied by the generated ARH(1) process. Similarly, Conditions H1–H3 in [Guillas, 2001, p. 283] are satisfied as well.
In [Bosq, 2000, Section 8.2] the following estimator of is proposed
[TABLE]
in the finite dimensional subspace
[TABLE]
of where is the orthogonal projector over and, as before, for
A modified estimator of is studied in [Guillas, 2001, Section 2], given by
[TABLE]
where
[TABLE]
Here, is such that (see [Guillas, 2001, Theorem 1])
[TABLE]
Tables 2–3 display the truncated, for two different rules, empirical values of based on generations of each one of the functional samples considered with sizes when
[TABLE]
Specifically, is computed from equations (15)–(16) (see third column), , with being given in equations (60)–(61) (see fourth column), and , with being defined in (62)–(63) (see fifth column).
In Table 2, and for according to our Assumption A3, which is also considered in [Bosq, 2000, p. 217] to ensure weak consistency of the proposed estimator of . In Table 3, the same empirical values are displayed for and is selected according to [Guillas, 2001, Example 2]. Thus, in Table 3,
[TABLE]
In particular we have chosen and Note that, from [Guillas, 2001, Theorem 1 and Remark 1], for the choice made of in Table 3, convergence to in quadratic mean in the space holds for given in (62)–(63).
One can observe in Table 2 a similar performance of the three methods compared with the truncation order kn satisfying Assumption A3, with slightly worse results being obtained from the estimator defined in (62)–(63), specially, for the sample size Furthermore, in Table 3, a better performance of our approach is observed for the smallest sample sizes (from until ). For the remaining largest sample sizes, only slight differences are observed, with, again, a better performance of our approach, very close to the other two approaches presented in Bosq [2000], Guillas [2001].
5.2.2 Empirical–eigenvector–based componentwise estimators
In this section, we address the case where are unknown, as is often the case in practice. Specifically, for a given sample size , let be the empirical counterpart of the theoretical eigenvectors , satisfying, for every ,
[TABLE]
where denotes the system of eigenvalues associated with the system of empirical eigenvectors . We then consider the following estimators for comparison purposes
[TABLE]
where, for and denotes the orthogonal projector into the space
[TABLE]
The Gaussian ARH(1) process is generated under Assumptions A1–A2, as well as in [Bosq, 2000, p. 218]. Note that conditions and in Bosq [2000] already hold. Moreover, as given in [Bosq, 2000, Theorem 8.8 and Example 8.6], for
[TABLE]
with, in particular, and for
[TABLE]
with , the estimator converges almost surely to under the condition
[TABLE]
where
[TABLE]
In Table 4, has been tested; see [Bosq, 2000, Example 8.6].
A better performance of our estimator (65) in comparison with estimator (66), formulated in Bosq [2000], and estimator (67), formulated in [Guillas, 2001, Example 4 and Remark 4], is observed in Table 4. Note that, in particular, in [Guillas, 2001, Example 4 and Remark 4], smaller values of than are required for a given sample size to ensure convergence in quadratic mean, and, in particular, weak–consistency. However, considering a smaller discretization step size than in Table 4, where , and for (i.e., ), we obtain in Table 5, for the same parameter values and better results than in Table 4, since a smaller number of coefficients of (parameters) to be estimated is considered in Table 5, from a richer sample information (coming from the smaller discretization step size considered). One can also observe in Table 5 a similar performance of the three approaches studied. In Table 6, the value , with proposed in [Guillas, 2001, Example 4 and Remark 4] is considered to compute the truncated empirical values of for defined in equation (65) (third column), for given in equation (66) (fourth column), and for in equation (67) (fifth column). A similar performance of the three approaches is observed, with the exception of where the approach presented in Guillas [2001] displays a slightly better performance
5.2.3 Kernel–based nonparametric and penalized estimation
In practice, curves are observed in discrete times, and should be approximated by smooth functions. In Besse et al. [2000], the following optimization problem is considered:
[TABLE]
where is a linear differential operator of order Our interpolation is computed by Matlab smoothingspline method. Non-linear kernel regression is then considered, in terms of the smoothed functional data, solution to (68), as follows:
[TABLE]
where is the usual Gaussian kernel, and
[TABLE]
Alternatively, in Besse et al. [2000], prediction, in the context of functional autoregressive processes (FAR(1) processes), under the linear assumption on which is considered to be a compact operator, with is also studied, from smooth data solving the optimization problem
[TABLE]
where is the smoothing parameter, is the –dimensional functional subspace spanned by the leading eigenvectors of the autocovariance operator associated with its largest eigenvalues. Thus, smoothness and rank constraint are considered in the computation of the solution to the optimization problem (69). Such a solution is obtained by means of functional PCA.
The following regularized empirical estimators of and are then considered, with inversion of in the subspace :
[TABLE]
Thus, the regularized estimator of is given by
[TABLE]
and the predictor
[TABLE]
Due to computational cost limitations, in Table 7, the following statistics are evaluated to compare the performance of the two above-referred prediction methodologies:
[TABLE]
[TABLE]
It can be observed a similar performance of the kernel–based and penalized FAR(1) predictors, from smooth functional data, which is also comparable, considering one realization, to the performance obtained in Table 6, from the empirical eigenvectors.
5.2.4 Wavelet–based prediction for ARH(1) processes
The approach presented in Antoniadis and Sapatinas [2003] is now studied. Specifically, wavelet-based regularization is applied to obtain smooth estimates of the sample paths. The projection onto the space generated by translations of the scaling function at level associated with a multiresolution analysis of is first considered. For a given primary resolution level , with the following wavelet decomposition at resolution levels can be computed for any projected curve in the space for
[TABLE]
For the following variational problem is solved to obtain the smooth estimate of the curve
[TABLE]
where denotes the orthogonal projection operator of onto the orhogonal complement of and for
[TABLE]
Using the equivalent sequence of norms of fractional Sobolev spaces of order with on a suitable interval (in our case, ), the minimization of (72) is equivalent to the optimization problem, for
[TABLE]
The solution to (73) is given by, for
[TABLE]
In particular, in the subsequent computations, we have considered the following value of the smoothing parameter (see Angelini et al. [2003]):
[TABLE]
The following smoothed data are then computed
[TABLE]
removing the trend
[TABLE]
to obtain
[TABLE]
for the computation of
[TABLE]
for and
[TABLE]
where
[TABLE]
and
[TABLE]
for every . Table 8 displays the empirical truncated approximation of the expectation and respectively obtained applying our approach, and the approach in Antoniadis and Sapatinas [2003], in the estimation of the autocorrelation operator . Here, we have tested with according to Assumption A3, and according to
[TABLE]
in [Antoniadis and Sapatinas, 2003, p. 149]. In particular, we have considered and From the results displayed in Table 8, one can observe a similar performance for the two truncation rules implemented, and approaches compared, for the small sample sizes tested. A similar accuracy is also displayed by the approaches presented in Besse et al. [2000], for such small sample sizes (see Table 7).
6 Final comments
As noted before, in this paper, the eigenvectors of are considered to be known in the derivation of the results on consistency. This assumption is satisfied, e.g., when the random initial condition is given as the solution, in the mean-square sense, of a stochastic differential equation driven by white noise (e.g., the Wiener measure), since the eigenvectors of the differential operator involved in that equation coincide with the eigenvectors of the autocovariance operator of the ARH(1) process. In the case where the eigenvectors of the autocovariance operator are unknown, the numerical results displayed in Tables 4–6 illustrate the fact that our approach displays, in terms of the empirical eigenvectors, very similar prediction results to those obtained with the implementation of the componentwise estimators proposed in Bosq [2000], Guillas [2001], with a better performance of our approach in the more unfavorable case, corresponding to a large discretization step size, and truncation order (see Table 4 computed for ).
Regarding Assumption A2, Remark 1 provides an example where this assumption is satisfied. However, our approach can still be applied in a wider range of situations. Wavelet bases are well suited for sparse representation of functions; recent work has considered combining them with sparsity-inducing penalties, both for semiparametric regression (see, e.g., Wand and Ormerod [2011]), and for regression with functional or kernel predictors (see Wand and Ormerod [2011], Zhao et al. [2012, 2015], among others). The latter papers focused on penalization, also known as the lasso (see Tibshirani [1996]), in the wavelet domain. Alternatives to the lasso include the SCAD penalty by Fan and Li [2001], and the adaptive lasso by Zou [2006]. The penalty in the elastic net criterion has the effect of shrinking small coefficients to zero. This can be interpreted as imposing a prior that favors a sparse estimate. The above mentioned smoothing techniques, based on wavelets, can be applied to obtain a smooth sparse approximation of the functional data whose empirical auto-covariance operator
[TABLE]
and cross-covariance operator
[TABLE]
admits a diagonal representation in terms of wavelets.
In the literature, shrinkage approaches for estimating a high–dimensional covariance matrix are employed to circumvent the limitations of the sample covariance matrix. In particular, a new family of nonparametric Stein–type shrinkage covariance estimators is proposed in Touloumis [2015] (see also references therein), whose members are written as a convex linear combination of the sample covariance matrix and of a predefined invertible diagonal target matrix. These results can be applied to our framework, considering the shrinkage estimators of the autocovariance and cross-covariance operators, with respect to a common suitable wavelet basis, which can lead to an empirical diagonal representation of both operators.
In the Supplementary Material provided (see Appendix 7), a numerical example is provided to illustrate the performance of our approach, in the case of a pseudo–diagonal autocorrelation operator.
7 Supplementary Material: non–diagonal autocorrelation operator
This Section provides as a numerical example where the methodology proposed in such paper still works beyond the considered Assumption A2. In particular, this section illustrates the performance of the proposed estimation methodology, when Assumption A2 is not satisfied, but is close to be diagonal in some sense. The numerical results obtained are compared to those ones derived from the computation of the ARH(1) predictors, based on the componentwise estimators proposed in Bosq [2000], Guillas [2001] where this diagonal assumption is not required. The Gaussian ARH(1) process generated has autocorrelation operator with coefficients with respect to the basis given by
[TABLE]
in the diagonal, and outside of the diagonal
[TABLE]
where when . The coefficients of the autocovariance operator of the innovation process with respect to the mentioned basis are given by
[TABLE]
in the diagonal, and outside of the diagonal by
[TABLE]
where when . Table 9 below displays the empirical truncated values of based on simulations of each one of the 20 functional samples considered, with sizes , for the corresponding values obtained, in this case, by the rule , with . We have considered parameter in the definition of the eigenvalues of ; but, in this case, as noted before, operators and are non-diagonal (see equations 75–76). The estimators of and the associated plug–in predictors are computed, for the three approaches compared, under the assumption that the eigenvectors of C are known.
As expected, in Table 9, an outperformance of the approaches in Bosq [2000], Guillas [2001] is observed in comparison with our methodology. However, for large sample sizes, the ARH(1) prediction methodology proposed here still can be applied with an order of magnitude of for the empirical errors associated with given in equation 65. Thus, in the pseudodiagonal autocorrelation operator case, in some sense, our approach could still be considered. As referred in our paper, an example is given in the case where the autocovariance and autocorrelation operators admit a sparse representation in terms of a suitable orthonormal wavelet basis (see, for instance, Angelini et al. [2003], Antoniadis and Sapatinas [2003]).
Acknowledgments
This work has been supported in part by project MTM2015–71839–P (co-funded by Feder funds), of the DGI, MINECO, Spain.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aneiros-Pérez and Vieu [2006] \NAT@biblabelnum Aneiros-Pérez and Vieu 2006 Aneiros-Pérez , G. ; Vieu , P.: Semi-functional partial linear regression. Statist. Probab. Lett. 76 (2006), pp. 1102–1110. – DOI: doi.org/10.1016/j.spl.2005.12.007
- 2Aneiros-Pérez and Vieu [2008] \NAT@biblabelnum Aneiros-Pérez and Vieu 2008 Aneiros-Pérez , G. ; Vieu , P.: Nonparametric time series prediction: a semifunctional partial linear modeling. J. Multivariate Anal. 99 (2008), pp. 834–857. – DOI: doi.org/10.1016/j.jmva.2007.04.010
- 3Angelini et al. [2003] \NAT@biblabelnum Angelini et al. 2003 Angelini , C. ; Canditiis , D. D. ; Leblanc , F.: Wavelet regression estimation in nonparametric mixed effect models. J. Multivariate Anal. 85 (2003), pp. 267–291. – DOI: doi.org/10.1016/S 0047-259X(02)00055-6
- 4Antoniadis and Sapatinas [2003] \NAT@biblabelnum Antoniadis and Sapatinas 2003 Antoniadis , A. ; Sapatinas , T.: Wavelet methods for continuous-time prediction using Hilbert-valued autoregressive processes. J. Multivariate Anal. 87 (2003), pp. 133–158. – DOI: doi.org/10.1016/S 0047-259X(03)00028-9
- 5Bartlett [1946] \NAT@biblabelnum Bartlett 1946 Bartlett , M. S.: On the theoretical specification and sampling properties of autocorrelated time series. Supplement to J. Roy. Stat. Soc. 8 (1946), pp. 27–41. – URL http://www.jstor.org/stable/2983611
- 6Bensmain and Mourid [2001] \NAT@biblabelnum Bensmain and Mourid 2001 Bensmain , N. ; Mourid , T.: Estimateur ”sieve” de l’opérateur d’un processus ARH(1). C. R. Acad. Sci. Paris Sér. I Math. 332 (2001), pp. 1015–1018. – DOI: doi.org/10.1016/S 0764-4442(01)01954-1
- 7Besse et al. [2000] \NAT@biblabelnum Besse et al. 2000 Besse , P. C. ; Cardot , H. ; Stephenson , D. B.: Autoregressive forecasting of some functional climatic variations. Scand. J. Statist. 27 (2000), pp. 673–687. – DOI: doi.org/10.1111/1467-9469.00215
- 8Bongiorno et al. [2014] \NAT@biblabelnum Bongiorno et al. 2014 Bongiorno , G. ; Goia , A. ; Salinelli , E. ; Vieu , P.: Contributions in infinite–dimensional statistics and related topics . In:Soc. Editrice Esculapio, Bologna, 2014. – ISBN 9788874887637
