Asymptotic properties of a componentwise ARH(1) plug-in predictor

J. \'Alvarez-Li\'ebana; D. Bosq; M. Dolores Ruiz-Medina

arXiv:1706.06498·math.ST·September 5, 2018·J. Multivar. Anal.

Asymptotic properties of a componentwise ARH(1) plug-in predictor

J. \'Alvarez-Li\'ebana, D. Bosq, M. Dolores Ruiz-Medina

PDF

TL;DR

This paper develops and proves the consistency of a componentwise estimator and predictor for ARH(1) processes in Hilbert spaces, supported by simulations comparing its performance to existing methods.

Contribution

It introduces a new componentwise estimation method for the autocorrelation operator in ARH(1) processes with known eigenvectors, proving its convergence and consistency.

Findings

01

Estimator converges in mean-square to the true autocorrelation operator.

02

The predictor shows mean absolute convergence to the conditional expectation.

03

Simulation results demonstrate the estimator's finite-sample effectiveness and compare favorably with existing methods.

Abstract

This paper presents new results on prediction of linear processes in function spaces. The autoregressive Hilbertian process framework of order one (ARH(1) process framework) is adopted. A componentwise estimator of the autocorrelation operator is formulated, from the moment-based estimation of its diagonal coefficients, with respect to the orthogonal eigenvectors of the auto-covariance operator, which are assumed to be known. Mean-square convergence to the theoretical autocorrelation operator, in the space of Hilbert-Schmidt operators, is proved. Consistency then follows in that space. For the associated ARH(1) plug-in predictor, mean absolute convergence to the corresponding conditional expectation, in the considered Hilbert space, is obtained. Hence, consistency in that space also holds. A simulation study is undertaken to illustrate the finite-large sample behavior of the formulated…

Tables9

Table 1. Table 1 : EMSE ρ ^ k n subscript EMSE subscript ^ 𝜌 subscript 𝑘 𝑛 {\rm EMSE}_{\widehat{\rho}_{k_{n}}} (here, MSE ρ ^ k n , i subscript MSE subscript ^ 𝜌 subscript 𝑘 𝑛 𝑖 {\rm MSE}_{\widehat{\rho}_{k_{n,i}}} ), and UB ( EMAE ) X ^ n k n UB subscript EMAE superscript subscript ^ 𝑋 𝑛 subscript 𝑘 𝑛 {\rm UB(EMAE)}_{\widehat{X}_{n}^{k_{n}}} (here, UB X ^ n k n , i subscript UB subscript ^ 𝑋 superscript 𝑛 subscript 𝑘 𝑛 𝑖 {\rm UB}_{\widehat{X}_{n^{k_{n,i}}}} ) values, in ( 57 )–( 59 ), based on N = 700 𝑁 700 N=700 simulations, for γ 1 = 0.4 subscript 𝛾 1 0.4 \gamma_{1}=0.4 and γ 2 = 9 / 20 subscript 𝛾 2 9 20 \gamma_{2}=9/20 , considering the sample sizes { n t = 15000 + 20000 ( t − 1 ) , t = 1 , … , 20 } formulae-sequence subscript 𝑛 𝑡 15000 20000 𝑡 1 𝑡 1 … 20 \left\{n_{t}=15000+20000(t-1),\ t=1,\dots,20\right\} and the corresponding k n , 1 subscript 𝑘 𝑛 1 k_{n,1} and k n , 2 subscript 𝑘 𝑛 2 k_{n,2} values, for α 1 = 5 subscript 𝛼 1 5 \alpha_{1}=5 and α 2 = 6 subscript 𝛼 2 6 \alpha_{2}=6 .

$n$	$k_{n, 1}$	${MSE}_{{\hat{ρ}}_{k_{n, 1}}}$	${UB}_{{\hat{X}}_{n^{k_{n, 1}}}}$	$k_{n, 2}$	${MSE}_{{\hat{ρ}}_{k_{n, 2}}}$	${UB}_{{\hat{X}}_{n^{k_{n, 2}}}}$
$n_{1} = 15000$	$6$	$3.74 {(10)}^{- 4}$	$2.87 {(10)}^{- 2}$	$4$	$2.45 {(10)}^{- 4}$	$2.25 {(10)}^{- 2}$
$n_{2} = 35000$	$8$	$2.15 {(10)}^{- 4}$	$2.21 {(10)}^{- 2}$	$5$	$1.35 {(10)}^{- 4}$	$1.71 {(10)}^{- 2}$
$n_{3} = 55000$	$8$	$1.34 {(10)}^{- 4}$	$1.75 {(10)}^{- 2}$	$6$	$1.03 {(10)}^{- 4}$	$1.51 {(10)}^{- 2}$
$n_{4} = 75000$	$9$	$1.09 {(10)}^{- 4}$	$1.57 {(10)}^{- 2}$	$6$	$7.55 {(10)}^{- 5}$	$1.29 {(10)}^{- 2}$
$n_{5} = 95000$	$9$	$9.48 {(10)}^{- 5}$	$1.47 {(10)}^{- 2}$	$6$	$5.86 {(10)}^{- 5}$	$1.14 {(10)}^{- 2}$
$n_{6} = 115000$	$10$	$8.31 {(10)}^{- 5}$	$1.39 {(10)}^{- 2}$	$6$	$5.16 {(10)}^{- 5}$	$1.07 {(10)}^{- 2}$
$n_{7} = 135000$	$10$	$6.81 {(10)}^{- 5}$	$1.25 {(10)}^{- 2}$	$7$	$4.86 {(10)}^{- 5}$	$1.04 {(10)}^{- 2}$
$n_{8} = 155000$	$10$	$6.37 {(10)}^{- 5}$	$1.21 {(10)}^{- 2}$	$7$	$3.88 {(10)}^{- 5}$	$9.66 {(10)}^{- 3}$
$n_{9} = 175000$	$11$	$6.14 {(10)}^{- 5}$	$1.19 {(10)}^{- 2}$	$7$	$3.87 {(10)}^{- 5}$	$9.65 {(10)}^{- 3}$
$n_{10} = 195000$	$11$	$5.34 {(10)}^{- 5}$	$1.11 {(10)}^{- 2}$	$7$	$3.42 {(10)}^{- 5}$	$8.79 {(10)}^{- 3}$
$n_{11} = 215000$	$11$	$4.67 {(10)}^{- 5}$	$1.03 {(10)}^{- 2}$	$7$	$3.40 {(10)}^{- 5}$	$8.74 {(10)}^{- 3}$
$n_{12} = 235000$	$11$	$4.66 {(10)}^{- 5}$	$1.03 {(10)}^{- 2}$	$7$	$2.92 {(10)}^{- 5}$	$8.12 {(10)}^{- 3}$
$n_{13} = 255000$	$12$	$4.53 {(10)}^{- 5}$	$1.02 {(10)}^{- 2}$	$7$	$2.77 {(10)}^{- 5}$	$7.95 {(10)}^{- 3}$
$n_{14} = 275000$	$12$	$4.24 {(10)}^{- 5}$	$9.95 {(10)}^{- 3}$	$8$	$2.77 {(10)}^{- 5}$	$7.94 {(10)}^{- 3}$
$n_{15} = 295000$	$12$	$3.72 {(10)}^{- 5}$	$9.32 {(10)}^{- 3}$	$8$	$2.67 {(10)}^{- 5}$	$7.76 {(10)}^{- 3}$
$n_{16} = 315000$	$12$	$3.62 {(10)}^{- 5}$	$9.21 {(10)}^{- 3}$	$8$	$2.55 {(10)}^{- 5}$	$7.64 {(10)}^{- 3}$
$n_{17} = 335000$	$12$	$3.39 {(10)}^{- 5}$	$8.91 {(10)}^{- 3}$	$8$	$2.28 {(10)}^{- 5}$	$7.04 {(10)}^{- 3}$
$n_{18} = 355000$	$12$	$3.34 {(10)}^{- 5}$	$8.86 {(10)}^{- 3}$	$8$	$2.20 {(10)}^{- 5}$	$7.04 {(10)}^{- 3}$
$n_{19} = 375000$	$13$	$3.34 {(10)}^{- 5}$	$8.86 {(10)}^{- 3}$	$8$	$2.04 {(10)}^{- 5}$	$6.84 {(10)}^{- 3}$
$n_{20} = 395000$	$13$	$3.12 {(10)}^{- 5}$	$8.56 {(10)}^{- 3}$	$8$	$1.92 {(10)}^{- 5}$	$6.65 {(10)}^{- 3}$

Table 2. Table 2 : Truncated empirical values of E ‖ ρ ( X n − 1 ) − ρ ^ k n ( X n − 1 ) ‖ H , E subscript norm 𝜌 subscript 𝑋 𝑛 1 subscript ^ 𝜌 subscript 𝑘 𝑛 subscript 𝑋 𝑛 1 𝐻 {\rm E}\|\rho\left(X_{n-1}\right)-\widehat{\rho}_{k_{n}}(X_{n-1})\|_{H}, for ρ ^ k n subscript ^ 𝜌 subscript 𝑘 𝑛 \widehat{\rho}_{k_{n}} given in equations ( 15 )-( 16 ) (third column), in equations ( 60 )–( 61 ) (fourth column), and in equations ( 62 )–( 63 ) (fifth column), based on N = 700 𝑁 700 N=700 simulations, for δ 1 = 2.4 subscript 𝛿 1 2.4 \delta_{1}=2.4 and δ 2 = 1.1 , subscript 𝛿 2 1.1 \delta_{2}=1.1, considering the sample sizes { n t = 15000 + 20000 ( t − 1 ) , t = 1 , … , 20 } formulae-sequence subscript 𝑛 𝑡 15000 20000 𝑡 1 𝑡 1 … 20 \left\{n_{t}=15000+20000(t-1),\ t=1,\dots,20\right\} and the corresponding k n = ⌈ n 1 / α ⌉ subscript 𝑘 𝑛 superscript 𝑛 1 𝛼 k_{n}=\lceil n^{1/\alpha}\rceil values, for α = 6 𝛼 6 \alpha=6 .

$n$	$k_{n}$	Our Approach	Bosq (2000)	Guillas (2001)
$n_{1} = 15000$	$4$	$2.25 {(10)}^{- 2}$	$2.57 {(10)}^{- 2}$	$2.36 {(10)}^{- 2}$
$n_{2} = 35000$	$5$	$1.71 {(10)}^{- 2}$	$1.72 {(10)}^{- 2}$	$1.84 {(10)}^{- 2}$
$n_{3} = 55000$	$6$	$1.51 {(10)}^{- 2}$	$1.65 {(10)}^{- 2}$	$1.53 {(10)}^{- 2}$
$n_{4} = 75000$	$6$	$1.29 {(10)}^{- 2}$	$1.46 {(10)}^{- 2}$	$1.37 {(10)}^{- 2}$
$n_{5} = 95000$	$6$	$1.14 {(10)}^{- 2}$	$1.20 {(10)}^{- 2}$	$1.16 {(10)}^{- 2}$
$n_{6} = 115000$	$6$	$1.07 {(10)}^{- 2}$	$1.10 {(10)}^{- 2}$	$1.11 {(10)}^{- 2}$
$n_{7} = 135000$	$7$	$1.04 {(10)}^{- 2}$	$1.06 {(10)}^{- 2}$	$1.07 {(10)}^{- 2}$
$n_{8} = 155000$	$7$	$9.66 {(10)}^{- 3}$	$9.91 {(10)}^{- 3}$	$1.01 {(10)}^{- 2}$
$n_{9} = 175000$	$7$	$9.65 {(10)}^{- 3}$	$9.79 {(10)}^{- 3}$	$9.68 {(10)}^{- 3}$
$n_{10} = 195000$	$7$	$8.79 {(10)}^{- 3}$	$9.12 {(10)}^{- 3}$	$8.93 {(10)}^{- 3}$
$n_{11} = 215000$	$7$	$8.74 {(10)}^{- 3}$	$8.79 {(10)}^{- 3}$	$8.83 {(10)}^{- 3}$
$n_{12} = 235000$	$7$	$8.12 {(10)}^{- 3}$	$8.69 {(10)}^{- 3}$	$8.75 {(10)}^{- 3}$
$n_{13} = 255000$	$7$	$7.95 {(10)}^{- 3}$	$8.53 {(10)}^{- 3}$	$8.73 {(10)}^{- 3}$
$n_{14} = 275000$	$8$	$7.94 {(10)}^{- 3}$	$8.52 {(10)}^{- 3}$	$8.58 {(10)}^{- 3}$
$n_{15} = 295000$	$8$	$7.76 {(10)}^{- 3}$	$8.49 {(10)}^{- 3}$	$8.36 {(10)}^{- 3}$
$n_{16} = 315000$	$8$	$7.64 {(10)}^{- 3}$	$7.88 {(10)}^{- 3}$	$8.13 {(10)}^{- 3}$
$n_{17} = 335000$	$8$	$7.04 {(10)}^{- 3}$	$7.24 {(10)}^{- 3}$	$7.59 {(10)}^{- 3}$
$n_{18} = 355000$	$8$	$7.04 {(10)}^{- 3}$	$7.23 {(10)}^{- 3}$	$6.92 {(10)}^{- 3}$
$n_{19} = 375000$	$8$	$6.84 {(10)}^{- 3}$	$6.89 {(10)}^{- 3}$	$6.90 {(10)}^{- 3}$
$n_{20} = 395000$	$8$	$6.65 {(10)}^{- 3}$	$6.67 {(10)}^{- 3}$	$6.85 {(10)}^{- 3}$

Table 3. Table 3 : Truncated empirical values of E ‖ ρ ( X n − 1 ) − ρ ^ k n ( X n − 1 ) ‖ H , E subscript norm 𝜌 subscript 𝑋 𝑛 1 subscript ^ 𝜌 subscript 𝑘 𝑛 subscript 𝑋 𝑛 1 𝐻 {\rm E}\|\rho\left(X_{n-1}\right)-\widehat{\rho}_{k_{n}}(X_{n-1})\|_{H}, for ρ ^ k n subscript ^ 𝜌 subscript 𝑘 𝑛 \widehat{\rho}_{k_{n}} given in equations ( 15 )–( 16 ) (third column), in equations ( 60 )–( 61 ) (fourth column), and in equations ( 62 )–( 63 ) (fifth column), based on N = 700 𝑁 700 N=700 simulations, for δ 1 = 61 60 subscript 𝛿 1 61 60 \delta_{1}=\frac{61}{60} and δ 2 = 1.1 , subscript 𝛿 2 1.1 \delta_{2}=1.1, considering the sample sizes { n t = 15000 + 20000 ( t − 1 ) , t = 1 , … , 20 } formulae-sequence subscript 𝑛 𝑡 15000 20000 𝑡 1 𝑡 1 … 20 \left\{n_{t}=15000+20000(t-1),\ t=1,\dots,20\right\} and the corresponding k n subscript 𝑘 𝑛 k_{n} given in ( 64 ).

$n$	$k_{n}$	Our Approach	Bosq (2000)	Guillas (2001)
$n_{1} = 15000$	$2$	$9.91 {(10)}^{- 3}$	$1.39 {(10)}^{- 2}$	$1.26 {(10)}^{- 2}$
$n_{2} = 35000$	$3$	$8.78 {(10)}^{- 3}$	$1.34 {(10)}^{- 2}$	$1.24 {(10)}^{- 2}$
$n_{3} = 55000$	$3$	$7.89 {(10)}^{- 3}$	$1.15 {(10)}^{- 2}$	$1.14 {(10)}^{- 2}$
$n_{4} = 75000$	$3$	$6.49 {(10)}^{- 3}$	$1.01 {(10)}^{- 2}$	$8.58 {(10)}^{- 3}$
$n_{5} = 95000$	$3$	$6.36 {(10)}^{- 3}$	$9.09 {(10)}^{- 3}$	$8.29 {(10)}^{- 3}$
$n_{6} = 115000$	$3$	$6.14 {(10)}^{- 3}$	$7.65 {(10)}^{- 3}$	$7.26 {(10)}^{- 3}$
$n_{7} = 135000$	$3$	$5.91 {(10)}^{- 3}$	$7.03 {(10)}^{- 3}$	$6.69 {(10)}^{- 3}$
$n_{8} = 155000$	$3$	$5.73 {(10)}^{- 3}$	$6.77 {(10)}^{- 3}$	$6.54 {(10)}^{- 3}$
$n_{9} = 175000$	$3$	$5.44 {(10)}^{- 3}$	$6.74 {(10)}^{- 3}$	$6.16 {(10)}^{- 3}$
$n_{10} = 195000$	$3$	$5.10 {(10)}^{- 3}$	$6.69 {(10)}^{- 3}$	$5.97 {(10)}^{- 3}$
$n_{11} = 215000$	$4$	$5.01 {(10)}^{- 3}$	$6.48 {(10)}^{- 3}$	$5.94 {(10)}^{- 3}$
$n_{12} = 235000$	$4$	$4.85 {(10)}^{- 3}$	$6.45 {(10)}^{- 3}$	$5.83 {(10)}^{- 3}$
$n_{13} = 255000$	$4$	$4.17 {(10)}^{- 3}$	$6.17 {(10)}^{- 3}$	$5.68 {(10)}^{- 3}$
$n_{14} = 275000$	$4$	$4.64 {(10)}^{- 3}$	$5.99 {(10)}^{- 3}$	$5.60 {(10)}^{- 3}$
$n_{15} = 295000$	$4$	$4.55 {(10)}^{- 3}$	$5.94 {(10)}^{- 3}$	$5.58 {(10)}^{- 3}$
$n_{16} = 315000$	$4$	$4.48 {(10)}^{- 3}$	$5.69 {(10)}^{- 3}$	$5.50 {(10)}^{- 3}$
$n_{17} = 335000$	$4$	$4.38 {(10)}^{- 3}$	$5.58 {(10)}^{- 3}$	$5.44 {(10)}^{- 3}$
$n_{18} = 355000$	$4$	$4.16 {(10)}^{- 3}$	$5.45 {(10)}^{- 3}$	$5.42 {(10)}^{- 3}$
$n_{19} = 375000$	$4$	$3.91 {(10)}^{- 3}$	$5.34 {(10)}^{- 3}$	$5.32 {(10)}^{- 3}$
$n_{20} = 395000$	$4$	$3.86 {(10)}^{- 3}$	$5.29 {(10)}^{- 3}$	$5.26 {(10)}^{- 3}$

Table 4. Table 4 : Truncated empirical values of E { ‖ ρ ( X n − 1 ) − ρ ~ k n ( X n − 1 ) ‖ H } E subscript norm 𝜌 subscript 𝑋 𝑛 1 subscript ~ 𝜌 subscript 𝑘 𝑛 subscript 𝑋 𝑛 1 𝐻 {\rm E}\left\{\left\|\rho\left(X_{n-1}\right)-\widetilde{\rho}_{k_{n}}\left(X_{n-1}\right)\right\|_{H}\right\} , for ρ ~ k n = ρ ~ k n subscript ~ 𝜌 subscript 𝑘 𝑛 subscript ~ 𝜌 subscript 𝑘 𝑛 \widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{k_{n}} given in equation ( 65 ) (third column), ρ ~ k n = ρ ~ n subscript ~ 𝜌 subscript 𝑘 𝑛 subscript ~ 𝜌 𝑛 \widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{n} defined in equation ( 66 ) (fourth column) and ρ ~ k n = ρ ~ n , a subscript ~ 𝜌 subscript 𝑘 𝑛 subscript ~ 𝜌 𝑛 𝑎 \widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{n,a} defined in equation ( 67 ) (fifth column), based on N = 700 𝑁 700 N=700 simulations, for δ 1 = 2.4 subscript 𝛿 1 2.4 \delta_{1}=2.4 and δ 2 = 1.1 , subscript 𝛿 2 1.1 \delta_{2}=1.1, considering the sample sizes { n t = 15000 + 20000 ( t − 1 ) , t = 1 , … , 20 } formulae-sequence subscript 𝑛 𝑡 15000 20000 𝑡 1 𝑡 1 … 20 \left\{n_{t}=15000+20000(t-1),\ t=1,\dots,20\right\} and k n = ⌈ ln ⁡ ( n ) ⌉ subscript 𝑘 𝑛 𝑛 k_{n}=\lceil\ln(n)\rceil .

$n$	$k_{n}$	Our approach	Bosq (2000)	Guillas (2001)
$n_{1} = 15000$	$9$	$8.42 {(10)}^{- 2}$	$1.061$	$1.035$
$n_{2} = 35000$	$10$	$5.51 {(10)}^{- 2}$	$1.019$	$1.005$
$n_{3} = 55000$	$10$	$4.75 {(10)}^{- 2}$	$1.017$	$0.999$
$n_{4} = 75000$	$11$	$4.43 {(10)}^{- 2}$	$1.015$	$0.995$
$n_{5} = 95000$	$11$	$3.68 {(10)}^{- 2}$	$1.013$	$0.988$
$n_{6} = 115000$	$11$	$3.51 {(10)}^{- 2}$	$1.011$	$0.963$
$n_{7} = 135000$	$11$	$3.23 {(10)}^{- 2}$	$1.008$	$0.925$
$n_{8} = 155000$	$11$	$2.95 {(10)}^{- 2}$	$1.007$	$0.912$
$n_{9} = 175000$	$12$	$2.94 {(10)}^{- 2}$	$1.006$	$0.911$
$n_{10} = 195000$	$12$	$2.80 {(10)}^{- 2}$	$0.995$	$0.891$
$n_{11} = 215000$	$12$	$2.71 {(10)}^{- 2}$	$0.902$	$0.862$
$n_{12} = 235000$	$12$	$2.59 {(10)}^{- 2}$	$0.890$	$0.820$
$n_{13} = 255000$	$12$	$2.58 {(10)}^{- 2}$	$0.878$	$0.800$
$n_{14} = 275000$	$12$	$2.35 {(10)}^{- 2}$	$0.872$	$0.783$
$n_{15} = 295000$	$12$	$2.28 {(10)}^{- 2}$	$0.860$	$0.778$
$n_{16} = 315000$	$12$	$2.27 {(10)}^{- 2}$	$0.842$	$0.747$
$n_{17} = 335000$	$12$	$2.16 {(10)}^{- 2}$	$0.822$	$0.714$
$n_{18} = 355000$	$12$	$2.14 {(10)}^{- 2}$	$0.800$	$0.707$
$n_{19} = 375000$	$12$	$2.09 {(10)}^{- 2}$	$0.778$	$0.687$
$n_{20} = 395000$	$12$	$2.06 {(10)}^{- 2}$	$0.769$	$0.662$

Table 5. Table 5 : Truncated empirical values of E { ‖ ρ ( X n − 1 ) − ρ ~ k n ( X n − 1 ) ‖ H } E subscript norm 𝜌 subscript 𝑋 𝑛 1 subscript ~ 𝜌 subscript 𝑘 𝑛 subscript 𝑋 𝑛 1 𝐻 {\rm E}\left\{\left\|\rho\left(X_{n-1}\right)-\widetilde{\rho}_{k_{n}}\left(X_{n-1}\right)\right\|_{H}\right\} , for ρ ~ k n subscript ~ 𝜌 subscript 𝑘 𝑛 \widetilde{\rho}_{k_{n}} defined in equation ( 65 ) (third column), for ρ ~ k n = ρ ~ n subscript ~ 𝜌 subscript 𝑘 𝑛 subscript ~ 𝜌 𝑛 \widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{n} given in equation ( 66 ) (fourth column), and for ρ ~ k n = ρ ~ n , a subscript ~ 𝜌 subscript 𝑘 𝑛 subscript ~ 𝜌 𝑛 𝑎 \widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{n,a} in equation ( 67 ) (fifth column), based on N = 200 𝑁 200 N=200 (due to high-dimensionality) simulations, for δ 1 = 2.4 subscript 𝛿 1 2.4 \delta_{1}=2.4 and δ 2 = 1.1 , subscript 𝛿 2 1.1 \delta_{2}=1.1, considering the sample sizes { n t = 15000 + 20000 ( t − 1 ) , t = 1 , … , 20 } formulae-sequence subscript 𝑛 𝑡 15000 20000 𝑡 1 𝑡 1 … 20 \left\{n_{t}=15000+20000(t-1),\ t=1,\dots,20\right\} and k n = ⌈ n 1 / 6 ⌉ . subscript 𝑘 𝑛 superscript 𝑛 1 6 k_{n}=\lceil n^{1/6}\rceil.

$n$	$k_{n}$	Our approach	Bosq (2000)	Guillas (2001)
$n_{1} = 15000$	$4$	$9.88 {(10)}^{- 2}$	$9.25 {(10)}^{- 2}$	$0.106$
$n_{2} = 35000$	$5$	$9.52 {(10)}^{- 2}$	$9.07 {(10)}^{- 2}$	$9.86 {(10)}^{- 2}$
$n_{3} = 55000$	$6$	$9.12 {(10)}^{- 2}$	$8.92 {(10)}^{- 2}$	$9.39 {(10)}^{- 2}$
$n_{4} = 75000$	$6$	$8.48 {(10)}^{- 2}$	$8.64 {(10)}^{- 2}$	$8.98 {(10)}^{- 2}$
$n_{5} = 95000$	$6$	$7.61 {(10)}^{- 2}$	$8.30 {(10)}^{- 2}$	$8.46 {(10)}^{- 2}$
$n_{6} = 115000$	$6$	$7.05 {(10)}^{- 2}$	$7.96 {(10)}^{- 2}$	$8.04 {(10)}^{- 2}$
$n_{7} = 135000$	$7$	$6.99 {(10)}^{- 2}$	$7.84 {(10)}^{- 2}$	$7.82 {(10)}^{- 2}$
$n_{8} = 155000$	$7$	$6.70 {(10)}^{- 2}$	$7.45 {(10)}^{- 2}$	$7.40 {(10)}^{- 2}$
$n_{9} = 175000$	$7$	$6.49 {(10)}^{- 2}$	$7.03 {(10)}^{- 2}$	$7.07 {(10)}^{- 2}$
$n_{10} = 195000$	$7$	$5.88 {(10)}^{- 2}$	$6.74 {(10)}^{- 2}$	$6.80 {(10)}^{- 2}$
$n_{11} = 215000$	$7$	$5.63 {(10)}^{- 2}$	$6.46 {(10)}^{- 2}$	$6.57 {(10)}^{- 2}$
$n_{12} = 235000$	$7$	$5.30 {(10)}^{- 2}$	$6.28 {(10)}^{- 2}$	$6.37 {(10)}^{- 2}$
$n_{13} = 255000$	$7$	$5.05 {(10)}^{- 2}$	$6.19 {(10)}^{- 2}$	$6.24 {(10)}^{- 2}$
$n_{14} = 275000$	$8$	$4.88 {(10)}^{- 2}$	$5.99 {(10)}^{- 2}$	$6.15 {(10)}^{- 2}$
$n_{15} = 295000$	$8$	$4.58 {(10)}^{- 2}$	$5.74 {(10)}^{- 2}$	$6.04 {(10)}^{- 2}$
$n_{16} = 315000$	$8$	$4.24 {(10)}^{- 2}$	$5.52 {(10)}^{- 2}$	$5.93 {(10)}^{- 2}$
$n_{17} = 335000$	$8$	$3.86 {(10)}^{- 2}$	$5.24 {(10)}^{- 2}$	$5.70 {(10)}^{- 2}$
$n_{18} = 355000$	$8$	$3.70 {(10)}^{- 2}$	$5.02 {(10)}^{- 2}$	$5.53 {(10)}^{- 2}$
$n_{19} = 375000$	$8$	$3.55 {(10)}^{- 2}$	$4.88 {(10)}^{- 2}$	$5.36 {(10)}^{- 2}$
$n_{20} = 395000$	$8$	$3.46 {(10)}^{- 2}$	$4.70 {(10)}^{- 2}$	$5.23 {(10)}^{- 2}$

Table 6. Table 6 : Truncated empirical values of E { ‖ ρ ( X n − 1 ) − ρ ~ k n ( X n − 1 ) ‖ H } E subscript norm 𝜌 subscript 𝑋 𝑛 1 subscript ~ 𝜌 subscript 𝑘 𝑛 subscript 𝑋 𝑛 1 𝐻 {\rm E}\left\{\left\|\rho\left(X_{n-1}\right)-\widetilde{\rho}_{k_{n}}\left(X_{n-1}\right)\right\|_{H}\right\} , for ρ ~ k n subscript ~ 𝜌 subscript 𝑘 𝑛 \widetilde{\rho}_{k_{n}} defined in equation ( 65 ) (third column), for ρ ~ k n = ρ ~ n subscript ~ 𝜌 subscript 𝑘 𝑛 subscript ~ 𝜌 𝑛 \widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{n} given in equation ( 66 ) (fourth column), and for ρ ~ k n = ρ ~ n , a subscript ~ 𝜌 subscript 𝑘 𝑛 subscript ~ 𝜌 𝑛 𝑎 \widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{n,a} in equation ( 67 ) (fifth column), based on N = 200 𝑁 200 N=200 (due to high-dimensionality) simulations, for δ 1 = 2.4 subscript 𝛿 1 2.4 \delta_{1}=2.4 and δ 2 = 1.1 , subscript 𝛿 2 1.1 \delta_{2}=1.1, considering the sample sizes { n t = 15000 + 20000 ( t − 1 ) , t = 1 , … , 20 } formulae-sequence subscript 𝑛 𝑡 15000 20000 𝑡 1 𝑡 1 … 20 \left\{n_{t}=15000+20000(t-1),\ t=1,\dots,20\right\} and k n = ⌈ e ′ n 1 / ( 8 δ 1 + 2 ) ⌉ , e ′ = 17 10 . formulae-sequence subscript 𝑘 𝑛 superscript 𝑒 ′ superscript 𝑛 1 8 subscript 𝛿 1 2 superscript 𝑒 ′ 17 10 k_{n}=\lceil e^{\prime}n^{1/\left(8\delta_{1}+2\right)}\rceil,\leavevmode\nobreak\ e^{\prime}=\frac{17}{10}.

$n$	$k_{n}$	Our approach	Bosq (2000)	Guillas (2001)
$n_{1} = 15000$	$2$	$6.78 {(10)}^{- 2}$	$8.77 {(10)}^{- 2}$	$6.64 {(10)}^{- 2}$
$n_{2} = 35000$	$2$	$6.72 {(10)}^{- 2}$	$8.61 {(10)}^{- 2}$	$6.30 {(10)}^{- 2}$
$n_{3} = 55000$	$2$	$6.46 {(10)}^{- 2}$	$8.48 {(10)}^{- 2}$	$6.17 {(10)}^{- 2}$
$n_{4} = 75000$	$2$	$6.24 {(10)}^{- 2}$	$8.20 {(10)}^{- 2}$	$5.76 {(10)}^{- 2}$
$n_{5} = 95000$	$2$	$5.42 {(10)}^{- 2}$	$7.84 {(10)}^{- 2}$	$5.03 {(10)}^{- 2}$
$n_{6} = 115000$	$2$	$4.84 {(10)}^{- 2}$	$7.34 {(10)}^{- 2}$	$4.56 {(10)}^{- 2}$
$n_{7} = 135000$	$2$	$4.27 {(10)}^{- 2}$	$6.95 {(10)}^{- 2}$	$3.94 {(10)}^{- 2}$
$n_{8} = 155000$	$2$	$3.64 {(10)}^{- 2}$	$6.60 {(10)}^{- 2}$	$3.65 {(10)}^{- 2}$
$n_{9} = 175000$	$3$	$3.51 {(10)}^{- 2}$	$6.52 {(10)}^{- 2}$	$3.42 {(10)}^{- 2}$
$n_{10} = 195000$	$3$	$3.38 {(10)}^{- 2}$	$6.16 {(10)}^{- 2}$	$3.24 {(10)}^{- 2}$
$n_{11} = 215000$	$3$	$3.16 {(10)}^{- 2}$	$5.78 {(10)}^{- 2}$	$2.85 {(10)}^{- 2}$
$n_{12} = 235000$	$3$	$2.98 {(10)}^{- 2}$	$5.53 {(10)}^{- 2}$	$2.60 {(10)}^{- 2}$
$n_{13} = 255000$	$3$	$2.83 {(10)}^{- 2}$	$5.15 {(10)}^{- 2}$	$2.34 {(10)}^{- 2}$
$n_{14} = 275000$	$3$	$2.50 {(10)}^{- 2}$	$4.85 {(10)}^{- 2}$	$2.05 {(10)}^{- 2}$
$n_{15} = 295000$	$3$	$2.23 {(10)}^{- 2}$	$4.46 {(10)}^{- 2}$	$1.83 {(10)}^{- 2}$
$n_{16} = 315000$	$3$	$2.15 {(10)}^{- 2}$	$4.30 {(10)}^{- 2}$	$1.58 {(10)}^{- 2}$
$n_{17} = 335000$	$3$	$2.06 {(10)}^{- 2}$	$4.14 {(10)}^{- 2}$	$1.40 {(10)}^{- 2}$
$n_{18} = 355000$	$3$	$1.98 {(10)}^{- 2}$	$3.95 {(10)}^{- 2}$	$1.24 {(10)}^{- 2}$
$n_{19} = 375000$	$3$	$1.89 {(10)}^{- 2}$	$3.77 {(10)}^{- 2}$	$1.05 {(10)}^{- 2}$
$n_{20} = 395000$	$3$	$1.82 {(10)}^{- 2}$	$3.70 {(10)}^{- 2}$	$9.93 {(10)}^{- 3}$

Table 7. Table 7 : E M A E X ^ n h n , i , 𝐸 𝑀 𝐴 superscript subscript 𝐸 subscript ^ 𝑋 𝑛 subscript ℎ 𝑛 𝑖 EMAE_{\widehat{X}_{n}}^{h_{n,i}}, i = 1 , 2 , 𝑖 1 2 i=1,2, and E M A E X ^ n q , l 𝐸 𝑀 𝐴 superscript subscript 𝐸 subscript ^ 𝑋 𝑛 𝑞 𝑙 EMAE_{\widehat{X}_{n}}^{q,l} values (see ( 70 ) and ( 71 ), respectively), with q = 7 𝑞 7 q=7 , based on N = 200 𝑁 200 N=200 simulations, for δ 1 = 2.4 subscript 𝛿 1 2.4 \delta_{1}=2.4 and δ 2 = 1.1 , subscript 𝛿 2 1.1 \delta_{2}=1.1, considering now the sample sizes { n t = 750 + 500 ( t − 1 ) , t = 1 , … , 13 } formulae-sequence subscript 𝑛 𝑡 750 500 𝑡 1 𝑡 1 … 13 \left\{n_{t}=750+500(t-1),\ t=1,\dots,13\right\} h n , 1 = 0.1 subscript ℎ 𝑛 1 0.1 h_{n,1}=0.1 and h n , 2 = 0.3 subscript ℎ 𝑛 2 0.3 h_{n,2}=0.3 .

$n$	$E M A E_{{\hat{X}}_{n}}^{h_{n, 1}}$	$E M A E_{{\hat{X}}_{n}}^{h_{n, 2}}$	$E M A E_{{\hat{X}}_{n}}^{q, l}$
$n_{1} = 750$	$8.57 {(10)}^{- 2}$	$8.85 {(10)}^{- 2}$	$8.99 {(10)}^{- 2}$
$n_{2} = 1250$	$7.67 {(10)}^{- 2}$	$8.43 {(10)}^{- 2}$	$8.69 {(10)}^{- 2}$
$n_{3} = 1750$	$7.15 {(10)}^{- 2}$	$7.12 {(10)}^{- 2}$	$8.05 {(10)}^{- 2}$
$n_{4} = 2250$	$7.09 {(10)}^{- 2}$	$6.87 {(10)}^{- 2}$	$7.59 {(10)}^{- 2}$
$n_{5} = 2750$	$6.87 {(10)}^{- 2}$	$6.67 {(10)}^{- 2}$	$7.31 {(10)}^{- 2}$
$n_{6} = 3250$	$6.52 {(10)}^{- 2}$	$5.92 {(10)}^{- 2}$	$7.28 {(10)}^{- 2}$
$n_{7} = 3750$	$6.20 {(10)}^{- 2}$	$5.56 {(10)}^{- 2}$	$7.13 {(10)}^{- 2}$
$n_{8} = 4250$	$6.06 {(10)}^{- 2}$	$5.32 {(10)}^{- 2}$	$7.06 {(10)}^{- 2}$
$n_{9} = 4750$	$5.67 {(10)}^{- 2}$	$5.25 {(10)}^{- 2}$	$6.47 {(10)}^{- 2}$
$n_{10} = 5250$	$5.24 {(10)}^{- 2}$	$5.12 {(10)}^{- 2}$	$6.08 {(10)}^{- 2}$
$n_{11} = 5750$	$5.01 {(10)}^{- 2}$	$4.82 {(10)}^{- 2}$	$5.75 {(10)}^{- 2}$
$n_{12} = 6250$	$4.90 {(10)}^{- 2}$	$4.49 {(10)}^{- 2}$	$5.33 {(10)}^{- 2}$
$n_{13} = 6750$	$4.87 {(10)}^{- 2}$	$3.87 {(10)}^{- 2}$	$4.97 {(10)}^{- 2}$

Table 8. Table 8 : Truncated empirical values of E { ‖ ρ ( X n − 1 ) − ρ ~ k n ( X n − 1 ) ‖ H } , E subscript norm 𝜌 subscript 𝑋 𝑛 1 subscript ~ 𝜌 subscript 𝑘 𝑛 subscript 𝑋 𝑛 1 𝐻 {\rm E}\left\{\|\rho(X_{n-1})-\widetilde{\rho}_{k_{n}}(X_{n-1})\|_{H}\right\}, with ρ ~ k n subscript ~ 𝜌 subscript 𝑘 𝑛 \widetilde{\rho}_{k_{n}} defined in equation ( 65 ), and of E { ‖ ρ ~ n , λ ^ M ( X n − 1 ) − ρ ( X n − 1 ) ‖ H } , E subscript norm subscript ~ 𝜌 𝑛 superscript ^ 𝜆 𝑀 subscript 𝑋 𝑛 1 𝜌 subscript 𝑋 𝑛 1 𝐻 {\rm E}\left\{\|\widetilde{\rho}_{n,\widehat{\lambda}^{M}}(X_{n-1})-\rho(X_{n-1})\|_{H}\right\}, , based on N = 200 𝑁 200 N=200 simulations, for δ 1 = 2.4 subscript 𝛿 1 2.4 \delta_{1}=2.4 and δ 2 = 1.1 , subscript 𝛿 2 1.1 \delta_{2}=1.1, considering the sample sizes { n t = 750 + 500 ( t − 1 ) , t = 1 , … , 13 } formulae-sequence subscript 𝑛 𝑡 750 500 𝑡 1 𝑡 1 … 13 \left\{n_{t}=750+500(t-1),\ t=1,\dots,13\right\} , using λ ^ M , M = 50 subscript ^ 𝜆 𝑀 𝑀 50 \widehat{\lambda}_{M},\leavevmode\nobreak\ M=50 , and the corresponding k n , i = ⌈ n 1 / α i ⌉ , subscript 𝑘 𝑛 𝑖 superscript 𝑛 1 subscript 𝛼 𝑖 k_{n,i}=\lceil n^{1/\alpha_{i}}\rceil, for α 1 = 6 subscript 𝛼 1 6 \alpha_{1}=6 and α 2 = 10 subscript 𝛼 2 10 \alpha_{2}=10 . Here, O.A. means Our Approach .

$n$	$k_{n, 1}$	O.A.	Antoniadis and Sapatinas [2003]	$k_{n, 2}$	O.A.	Antoniadis and Sapatinas [2003]
$750$	3	$0.070$	$0.091$	1	$0.064$	$0.059$
$1250$	3	$0.055$	$0.087$	2	$0.051$	$0.043$
$1750$	3	$0.047$	$0.080$	2	$0.045$	$0.039$
$2250$	3	$0.041$	$0.079$	2	$0.041$	$0.038$
$2750$	3	$0.037$	$0.073$	2	$0.036$	$0.035$
$3250$	3	$0.034$	$0.072$	2	$0.033$	$0.031$
$3750$	3	$0.033$	$0.068$	2	$0.033$	$0.029$
$4250$	4	$0.033$	$0.067$	2	$0.031$	$0.029$
$4750$	4	$0.032$	$0.066$	2	$0.031$	$0.026$
$5250$	4	$0.031$	$0.064$	2	$0.028$	$0.023$
$5750$	4	$0.030$	$0.060$	2	$0.020$	$0.019$
$6250$	4	$0.028$	$0.058$	2	$0.017$	$0.015$
$6750$	4	$0.028$	$0.056$	2	$0.019$	$0.014$

Table 9. Table 9 : Truncated empirical values of E { ‖ ρ ( X n − 1 ) − ρ ^ k n N D ( X n − 1 ) ‖ H } E subscript norm 𝜌 subscript 𝑋 𝑛 1 superscript subscript ^ 𝜌 subscript 𝑘 𝑛 𝑁 𝐷 subscript 𝑋 𝑛 1 𝐻 {\rm E}\left\{\left\|\rho(X_{n-1})-\widehat{\rho}_{k_{n}}^{ND}(X_{n-1})\right\|_{H}\right\} , for ρ ^ k n N D superscript subscript ^ 𝜌 subscript 𝑘 𝑛 𝑁 𝐷 \widehat{\rho}_{k_{n}}^{ND} given in equations ( 15 )–( 16 ) (third column), in equations ( 60 )–( 61 ) (fourth column), and in equations ( 62 )–( 63 ) (fifth column), from the non–diagonal data generated by equations ( 74 )–( 76 ), based on N = 200 𝑁 200 N=200 (due to high–dimensionality) simulations, for δ 1 = 2.4 subscript 𝛿 1 2.4 \delta_{1}=2.4 and δ 2 = 1.1 , subscript 𝛿 2 1.1 \delta_{2}=1.1, considering the sample sizes { n t = 15000 + 20000 ( t − 1 ) , t = 1 , … , 20 } formulae-sequence subscript 𝑛 𝑡 15000 20000 𝑡 1 𝑡 1 … 20 \left\{n_{t}=15000+20000(t-1),\ t=1,\dots,20\right\} and the corresponding k n = ⌈ n 1 / α ⌉ , α = 6 formulae-sequence subscript 𝑘 𝑛 superscript 𝑛 1 𝛼 𝛼 6 k_{n}=\lceil n^{1/\alpha}\rceil,\leavevmode\nobreak\ \alpha=6 values. The eigenvectors { ϕ j , j ≥ 1 } subscript italic-ϕ 𝑗 𝑗 1 \left\{\phi_{j},\ j\geq 1\right\} are assumed to be known.

$n$	$k_{n}$	Our approach	Bosq (2000)	Guillas (2001)
$n_{1} = 15000$	$4$	$0.581$	$8.94 {(10)}^{- 2}$	$0.1055$
$n_{2} = 35000$	$5$	$0.560$	$7.05 {(10)}^{- 2}$	$9.49 {(10)}^{- 2}$
$n_{3} = 55000$	$6$	$0.548$	$6.67 {(10)}^{- 2}$	$9.14 {(10)}^{- 2}$
$n_{4} = 75000$	$6$	$0.532$	$6.24 {(10)}^{- 2}$	$8.85 {(10)}^{- 2}$
$n_{5} = 95000$	$6$	$0.512$	$5.89 {(10)}^{- 2}$	$8.47 {(10)}^{- 2}$
$n_{6} = 115000$	$6$	$0.498$	$5.62 {(10)}^{- 2}$	$8.04 {(10)}^{- 2}$
$n_{7} = 135000$	$7$	$0.495$	$5.57 {(10)}^{- 2}$	$7.66 {(10)}^{- 2}$
$n_{8} = 155000$	$7$	$0.481$	$5.28 {(10)}^{- 2}$	$7.24 {(10)}^{- 2}$
$n_{9} = 175000$	$7$	$0.474$	$5.01 {(10)}^{- 2}$	$6.78 {(10)}^{- 2}$
$n_{10} = 195000$	$7$	$0.461$	$4.90 {(10)}^{- 2}$	$6.30 {(10)}^{- 2}$
$n_{11} = 215000$	$7$	$0.442$	$4.69 {(10)}^{- 2}$	$6.07 {(10)}^{- 2}$
$n_{12} = 235000$	$7$	$0.425$	$4.45 {(10)}^{- 2}$	$5.82 {(10)}^{- 2}$
$n_{13} = 255000$	$7$	$0.411$	$4.25 {(10)}^{- 2}$	$5.54 {(10)}^{- 2}$
$n_{14} = 275000$	$8$	$0.408$	$4.14 {(10)}^{- 2}$	$5.16 {(10)}^{- 2}$
$n_{15} = 295000$	$8$	$0.381$	$4.09 {(10)}^{- 2}$	$4.81 {(10)}^{- 2}$
$n_{16} = 315000$	$8$	$0.360$	$3.85 {(10)}^{- 2}$	$4.53 {(10)}^{- 2}$
$n_{17} = 335000$	$8$	$0.349$	$3.56 {(10)}^{- 2}$	$4.29 {(10)}^{- 2}$
$n_{18} = 355000$	$8$	$0.330$	$3.29 {(10)}^{- 2}$	$3.98 {(10)}^{- 2}$
$n_{19} = 375000$	$8$	$0.320$	$2.90 {(10)}^{- 2}$	$3.75 {(10)}^{- 2}$
$n_{20} = 395000$	$8$	$0.318$	$2.62 {(10)}^{- 2}$	$3.44 {(10)}^{- 2}$

Equations347

X_{n} = ρ (X_{n - 1}) + ε_{n},

X_{n} = ρ (X_{n - 1}) + ε_{n},

∥ ρ ∥_{L (H)} < 1.

∥ ρ ∥_{L (H)} < 1.

C = E {X_{n} \otimes X_{n}} = E {X_{0} \otimes X_{0}}, n \in Z,

C = E {X_{n} \otimes X_{n}} = E {X_{0} \otimes X_{0}}, n \in Z,

C = j = 1 \sum \infty C_{j} ϕ_{j} \otimes ϕ_{j},

C = j = 1 \sum \infty C_{j} ϕ_{j} \otimes ϕ_{j},

C_{1} \geq C_{2} \geq \dots \geq C_{j} \geq \dots > 0

C_{1} \geq C_{2} \geq \dots \geq C_{j} \geq \dots > 0

j = 1 \sum \infty C_{j} < \infty.

j = 1 \sum \infty C_{j} < \infty.

ρ = j = 1 \sum \infty ρ_{j} ϕ_{j} \otimes ϕ_{j}, j = 1 \sum \infty ρ_{j}^{2} < \infty,

ρ = j = 1 \sum \infty ρ_{j} ϕ_{j} \otimes ϕ_{j}, j = 1 \sum \infty ρ_{j}^{2} < \infty,

∥ ρ ∥_{L (H)} = j \geq 1 sup ∣ ρ_{j} ∣ < 1.

∥ ρ ∥_{L (H)} = j \geq 1 sup ∣ ρ_{j} ∣ < 1.

D = E {X_{n} \otimes X_{n + 1}} = E {X_{0} \otimes X_{1}}

D = E {X_{n} \otimes X_{n + 1}} = E {X_{0} \otimes X_{1}}

C_{ε} = C_{ρ} C ρ = j = 1 \sum \infty C_{j} (1 - ρ_{j}^{2}) ϕ_{j} \otimes ϕ_{j} = j = 1 \sum \infty σ_{j}^{2} ϕ_{j} \otimes ϕ_{j} .

C_{ε} = C_{ρ} C ρ = j = 1 \sum \infty C_{j} (1 - ρ_{j}^{2}) ϕ_{j} \otimes ϕ_{j} = j = 1 \sum \infty σ_{j}^{2} ϕ_{j} \otimes ϕ_{j} .

X_{n, j} = ρ_{j} X_{n - 1, j} + ε_{n, j}, n \in Z,

X_{n, j} = ρ_{j} X_{n - 1, j} + ε_{n, j}, n \in Z,

ρ_{j}

ρ_{j}

D_{j} = ⟨ D (ϕ_{j}), ϕ_{j} ⟩_{H} = E {X_{n, j} X_{n - 1, j}}, C_{j}^{- 1} = [E {X_{n - 1, j}^{2}}]^{- 1}, X_{n, j} = ⟨ X_{n}, ϕ_{j} ⟩_{H},

D_{j} = ⟨ D (ϕ_{j}), ϕ_{j} ⟩_{H} = E {X_{n, j} X_{n - 1, j}}, C_{j}^{- 1} = [E {X_{n - 1, j}^{2}}]^{- 1}, X_{n, j} = ⟨ X_{n}, ϕ_{j} ⟩_{H},

D = j = 1 \sum \infty D_{j} ϕ_{j} \otimes ϕ_{j}, D_{j} = ρ_{j} C_{j}, j \geq 1.

D = j = 1 \sum \infty D_{j} ϕ_{j} \otimes ϕ_{j}, D_{j} = ρ_{j} C_{j}, j \geq 1.

∥ Z ∥_{L_{H}^{2} (Ω, A, P)} = E {∥ Z ∥_{H}^{2}}, \forall Z \in L_{H}^{2} (Ω, A, P) .

∥ Z ∥_{L_{H}^{2} (Ω, A, P)} = E {∥ Z ∥_{H}^{2}}, \forall Z \in L_{H}^{2} (Ω, A, P) .

E {∥ Z - Y ∥_{H}} = 0.

E {∥ Z - Y ∥_{H}} = 0.

X_{n}

X_{n}

ε_{n}

η_{j} (n) = \frac{⟨ X _{n} , ϕ _{j} ⟩ _{H}}{C _{j}} = \frac{X _{n, j}}{C _{j}}, η_{j} (n) = \frac{⟨ ε _{n} , ϕ _{j} ⟩ _{H}}{σ _{j}} = \frac{ε _{n, j}}{σ _{j}}, n \in Z, \leavevmode j \geq 1.

η_{j} (n) = \frac{⟨ X _{n} , ϕ _{j} ⟩ _{H}}{C _{j}} = \frac{X _{n, j}}{C _{j}}, η_{j} (n) = \frac{⟨ ε _{n} , ϕ _{j} ⟩ _{H}}{σ _{j}} = \frac{ε _{n, j}}{σ _{j}}, n \in Z, \leavevmode j \geq 1.

C_{ε} (ϕ_{j}) = σ_{j}^{2} ϕ_{j}, j \geq 1,

C_{ε} (ϕ_{j}) = σ_{j}^{2} ϕ_{j}, j \geq 1,

j = 1 \sum \infty σ_{j}^{2} = σ_{ε}^{2} = E {∥ ε_{n} ∥_{H}^{2}},

j = 1 \sum \infty σ_{j}^{2} = σ_{ε}^{2} = E {∥ ε_{n} ∥_{H}^{2}},

M \to \infty lim E {X_{n} - X_{n, M}_{H}^{2}} = 0,

M \to \infty lim E {X_{n} - X_{n, M}_{H}^{2}} = 0,

M \to \infty lim E {(X_{n} - X_{n, M}) \otimes (X_{n} - X_{n, M})}_{S (H)}^{2} = 0.

M \to \infty lim E {(X_{n} - X_{n, M}) \otimes (X_{n} - X_{n, M})}_{S (H)}^{2} = 0.

ε_{n} = j = 1 \sum \infty σ_{j} \frac{⟨ ε _{n} , ϕ _{j} ⟩ _{H}}{σ _{j}} ϕ_{j} = j = 1 \sum \infty σ_{j} η_{j} (n) ϕ_{j} .

ε_{n} = j = 1 \sum \infty σ_{j} \frac{⟨ ε _{n} , ϕ _{j} ⟩ _{H}}{σ _{j}} ϕ_{j} = j = 1 \sum \infty σ_{j} η_{j} (n) ϕ_{j} .

{X_{n, M} = j = 1 \sum M C_{j} η_{j} (n) ϕ_{j}, \leavevmode M \geq 1}

{X_{n, M} = j = 1 \sum M C_{j} η_{j} (n) ϕ_{j}, \leavevmode M \geq 1}

∥ X_{n, M + L} - X_{n, M} ∥_{L_{H}^{2} (Ω, A, P)}^{2}

∥ X_{n, M + L} - X_{n, M} ∥_{L_{H}^{2} (Ω, A, P)}^{2}

M \to \infty lim j = M + 1 \sum M + L C_{j} = 0,

M \to \infty lim j = M + 1 \sum M + L C_{j} = 0,

{X_{n, M} = j = 1 \sum M C_{j} η_{j} (n) ϕ_{j}, M \geq 1}

{X_{n, M} = j = 1 \sum M C_{j} η_{j} (n) ϕ_{j}, M \geq 1}

M \to \infty lim E {X_{n} - X_{n, M}_{H}^{2}}

M \to \infty lim E {X_{n} - X_{n, M}_{H}^{2}}

C (ϕ_{j})

C (ϕ_{j})

E {η_{j} (n) η_{h} (n)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Asymptotic properties of a componentwise ARH(1) plug-in predictor

Javier Álvarez-Liébana1, Denis Bosq2 and M. Dolores Ruiz–Medina1

Summary

This paper presents new results on prediction of linear processes in function spaces. The autoregressive Hilbertian process framework of order one (ARH(1) process framework) is adopted. A componentwise estimator of the autocorrelation operator is formulated, from the moment–based estimation of its diagonal coefficients, with respect to the orthogonal eigenvectors of the auto-covariance operator, which are assumed to be known. Mean-square convergence to the theoretical autocorrelation operator, in the space of Hilbert-Schmidt operators, is proved. Consistency then follows in that space. For the associated ARH(1) plug-in predictor, mean absolute convergence to the corresponding conditional expectation, in the considered Hilbert space, is obtained. Hence, consistency in that space also holds. A simulation study is undertaken to illustrate the finite-large sample behavior of the formulated componentwise estimator and predictor. The performance of the presented approach is compared with alternative approaches in the previous and current ARH(1) framework literature, including the case of unknown eigenvectors.

Journal of Multivariate Analysis, 155, pp. 12-34. DOI: doi.org/10.1016/j.jmva.2016.11.009

1 Department of Statistics and O. R., University of Granada, Spain. 2 LSTA, Université Pierre et Marie Curie–Paris 6, Paris, France.

E-mail: [email protected]

Key words: ARH(1) processes; consistency; functional prediction; mean absolute and quadratic convergence.

1 Introduction

In the last few decades, an extensive literature on statistical inference from functional random variables has emerged. This work was motivated in part by the statistical analysis of high–dimensional data, as well as data of a continuous (infinite-dimensional) nature; see, e.g., Bosq [2000, 2007], Dedecker and Merlevède [2003], Ferraty and Vieu [2006], Merlevède [1996b, 1997], Ramsay and Silverman [2005], Ruiz-Medina [2012]. New developments in functional data analysis are described, e.g., in Bongiorno et al. [2014], Cuevas [2014], Horváth and Kokoszka [2012], Hsing and Eubank [2015], and in a recent Special Issue of this journal Goia and Vieu [2016].

These references include a nice summary on the statistics theory for functional data, contemplating covariance operator theory and eigenfunction expansion, perturbation theory, smoothing and regularization, probability measures on a Hilbert spaces, functional principal component analysis, functional counterparts of the multivariate canonical correlation analysis, the two sample problem and the change point problem, functional linear models, functional test for independence, functional time series theory, spatially distributed curves, software packages and numerical implementation of the statistical procedures discussed, among other topics.

The special case of functional regression models, in which the predictor is a random function and the response is scalar, has been particularly well studied. Various specifications of the functional regression parameter arise in fields such as biology, climatology, chemometrics, and economics. To avoid the computational (high–dimensional) limitations of the nonparametric approach, several parametric and semi–parametric methods have been proposed; see, e.g., Ferraty et al. [2012] and the references therein. In Ferraty et al. [2012], a combination of a spline approximation and the one–dimensional Nadaraya–Watson approach was proposed to avoid high dimensionality issues. Generalizations to the case of more regressors (all functional, or both functional and real) were also addressed in the nonparametric, semi–parametric, and parametric frameworks; for an overview, see Aneiros-Pérez and Vieu [2006], Febrero-Bande and González-Manteiga [2013], Ferraty and Vieu [2009].

In the nonparametric regression framework, the case where the covariates and the response are functional was considered by Ferraty et al. [2012], where a functional version of the Nadaraya–Watson estimator was proposed for the estimation of the regression operator and shown to be point–wise asymptotically normal. Resampling techniques were used to overcome the difficulties arising in the estimation of the asymptotic bias and variance. Semi–functional partial linear regression, introduced in Aneiros-Pérez and Vieu [2008], allows the prediction of a real-valued random variable from a set of real–valued explanatory variables, and a time–dependent functional explanatory variable. Motivated by genetic and environmental applications, a semi–parametric maximum likelihood method for the estimation of odds ratio association parameters was developed by Chen et al. [2012] in a high–dimensional data context.

In the autoregressive Hilbertian time series framework, several estimation and prediction procedures have been proposed and studied. Mas [1999] established, under suitable conditions, the asymptotic normal distribution of the formulated estimator of the autocorrelation operator, based on projection into the theoretical eigenvectors. In Bosq [2000], Bosq and Blanke [2007], the problem of prediction of linear processes in function spaces was addressed. In particular, sufficient conditions for the consistency of the empirical autocovariance and cross–covariance operators were obtained. The asymptotic normal distribution of the empirical autocovariance operator was also derived. Moreover, the asymptotic properties of the empirical eigenvalues and eigenvectors were analysed.

Guillas [2001] established the efficiency of a componentwise estimator of the autocorrelation operator, based on projection into the empirical eigenvector system of the autocovariance operator. Consistency, in the space of bounded linear operators, of the formulated estimator of the autocorrelation operator, and of its associated ARH(1) plug–in predictor was later proved by Mas [2004]. He derived sufficient conditions for the weak convergence of the ARH(1) plug–in predictor to a Hilbert–valued Gaussian random variable (see Mas [2007]). Simultaneously, Mas and Menneteau [2003a] obtained high deflection results or large and moderate deviations for infinite–dimensional autoregressive processes. Furthermore, the law of the iterated logarithm for the covariance operator estimator was formulated by Menneteau [2005].

The main properties of the class of autoregressive Hilbertian processes with random coefficients were investigated by Mourid [2004]. Kargin and Onatski [2008] gave interesting extensions of the autoregressive Hilbertian framework, based on the spectral decomposition of the autocorrelation operator, and not of the autocovariance operator. The first generalization on autoregressive processes of order greater than one was proposed by Mourid [1993], in order to improve prediction. ARHX(1) models; i.e., autoregressive Hilbertian processes with exogenous variables were studied by Damon and Guillas [2002, 2005]. In Guillas [2000, 2001] a doubly stochastic formulation of the autoregressive Hilbertian process was investigated. The ARHD model was introduced by Marion and Pumo [2004], taking into account the regularity of trajectories through the derivatives. The conditional autoregressive Hilbertian process (CARH process) was considered by Cugliari [2011], developing parallel projection estimation methods to predict such processes. In the Banach–valued context, we refer to the papers by Bensmain and Mourid [2001], Dehling and Sharipov [2005], Pumo [1992, 1998], among others.

In this paper, we assume that the autocorrelation operator belongs to the Hilbert–Schmidt class, and admits a diagonal spectral decomposition in terms of the orthogonal eigenvector system of the autocovariance operator. Such is the case, e.g., of an autocorrelation operator defined as a continuous function of the autocovariance operator. A componentwise estimator of the autocorrelation operator is then constructed in terms of the eigenvectors of the autocovariance operator, which are assumed to be known. This occurs when the random initial condition is defined as the solution, in the mean–square sense, of a stochastic differential equation driven by white noise. Beyond this case, the sparse representation and whitening properties of wavelet bases can be exploited to obtain a diagonal representation of the autocovariance and cross–covariance operators, in terms of a common and known wavelet basis. Unconditional bases, like wavelet bases, also allow the diagonal spectral series representation of the distributional kernels of Calderón-Zygmund operators.

Under the assumptions stated in Appendices 2–4, we establish the convergence in the $\mathcal{L}^{2}$ -sense of a componentwise estimator of the autocorrelation operator in the space of Hilbert–Schmidt operators $\mathcal{S}\left(H\right),$ i.e., $\mathcal{L}^{2}_{\mathcal{S}\left(H\right)}\left(\Omega,\mathcal{A},\mathcal{P}\right),$ is derived. Consistency then follows in $\mathcal{S}\left(H\right)$ . Under the same conditions, consistency in H of the associated ARH(1) plug–in predictor is obtained, from its convergence in the $\mathcal{L}^{1}$ -sense in the Hilbert space $H,$ i.e., in the space $\mathcal{L}^{1}_{H}\left(\Omega,\mathcal{A},\mathcal{P}\right)$ . The Gaussian framework is analysed in Appendix 4 and illustrated in Appendix 5, where examples show the behaviour of the proposed componentwise autocorrelation operator estimator, and associated predictor, for large sample sizes. We also present there a comparative study with alternative ARH(1) prediction techniques, including componentwise parameter estimation of the autocorrelation operator, from known and unknown eigenvectors, as well as kernel (nonparametric) functional estimation, and penalized, spline and wavelet, estimation. Final comments on the application of the proposed approach from real data are provided in Appendix 6.

2 Preliminaries

This section contains the preliminary definitions and lemmas that will be used to derive the main results of this paper. In the following, $H$ denotes a real separable Hilbert space. Recall that, from Bosq [2000], a zero–mean ARH(1) process $X=\left\{X_{n},\ n\in\mathbb{Z}\right\}$ satisfies, for all $n\in\mathbb{Z}$ , the equation

[TABLE]

where $\rho$ denotes the autocorrelation operator of the process $X,$ which belongs to the space $\mathcal{L}(H)$ of bounded linear operators, such that $\|\rho^{k}\|_{\mathcal{L}\left(H\right)}<1,$ for all integers $k\geq k_{0}$ beyond a certain $k_{0}\geq 1$ , with $\|\cdot\|_{\mathcal{L}(H)}$ denoting the norm in the space $\mathcal{L}(H).$ The Hilbert–valued innovation process $\varepsilon=\left\{\varepsilon_{n},\ n\in\mathbb{Z}\right\}$ is assumed to be a strong–white noise which is uncorrelated with the random initial condition. That is, $\varepsilon$ is a Hilbert–valued zero–mean stationary process, with independent and identically distributed components in time, with $\sigma^{2}_{\varepsilon}={\rm E}\left\{\|\varepsilon_{n}\|_{H}^{2}\right\}<\infty,$ for all $n\in\mathbb{Z}.$ We restrict our attention here to the case where $\rho$ is such that

[TABLE]

The following assumptions are made.

Assumption A1. The autocovariance operator

[TABLE]

is a positive, self–adjoint and trace operator. As a result, it admits the following diagonal spectral representation

[TABLE]

in terms of an orthonormal system $\left\{\phi_{j},\ j\geq 1\right\}$ of eigenvectors which are known. Here,

[TABLE]

denote the real positive eigenvalues of $C$ arranged in decreasing order of magnitude and

[TABLE]

Assumption A2. The autocorrelation operator $\rho$ is a self–adjoint and Hilbert–Schmidt operator, admitting the diagonal spectral decomposition

[TABLE]

where $\left\{\rho_{j},\ j\geq 1\right\}$ is the system of eigenvalues of the autocorrelation operator $\rho,$ with respect to the orthonormal system of eigenvectors $\left\{\phi_{j},\ j\geq 1\right\}$ of the autocovariance operator $C$ .

Note that, under Assumption A2,

[TABLE]

Remark 1

Assumption A2* holds, in particular, when operator $\rho$ is defined as a continuous function of operator C (see [Dautray and Lions, 1990, pp. 119–140] and Remark 4).*

In the following, for any $n\in\mathbb{Z},$ let

[TABLE]

be the cross–covariance operator of the ARH(1) process $X$ .

Remark 2

Under Assumptions A1–A2, it follows from equation (1) that

[TABLE]

By projecting equation (1) into the orthonormal system $\left\{\phi_{j},\ j\geq 1\right\}$ , we also have, for each $j\geq 1$ and all $n\in\mathbb{Z}$ , the AR(1) equation

[TABLE]

where $X_{n,j}=\left\langle X_{n},\phi_{j}\right\rangle_{H}$ and $\varepsilon_{n,j}=\left\langle\varepsilon_{n},\phi_{j}\right\rangle_{H},$ for all $n\in\mathbb{Z}$ . From equation (2), we have, for each $j\geq 1$ and all $n\in\mathbb{Z}$ ,

[TABLE]

where

[TABLE]

given that, for all $j\geq 1$ ,

[TABLE]

Let us now consider the Banach space $L_{\mathcal{H}}^{2}\left(\Omega,\mathcal{A},\mathcal{P}\right)$ of the equivalence classes of $\mathcal{L}_{\mathcal{H}}^{2}\left(\Omega,\mathcal{A},\mathcal{P}\right),$ the space of zero–mean second–order Hilbert–valued random variables ( $\mathcal{H}$ –valued random variables) with finite seminorm given by

[TABLE]

That is, for $Z,Y\in\mathcal{L}_{\mathcal{H}}^{2}\left(\Omega,\mathcal{A},\mathcal{P}\right),$ $Z$ and $Y$ belong to the same equivalence class if and only if

[TABLE]

The convergence in the seminorm of $\mathcal{L}_{\mathcal{S}(H)}^{2}\left(\Omega,\mathcal{A},\mathcal{P}\right)$ will be considered in Proposition 1, where $\mathcal{H}=\mathcal{S}(H)$ denotes the Hilbert space of Hilbert–Schmidt operators on a Hilbert space $H$ .

For each $n\in\mathbb{Z},$ let us consider the following biorthogonal representation of the functional value $X_{n}$ of the ARH(1) process $X=\left\{X_{n},\ n\in\mathbb{Z}\right\}$ , and of the functional value $\varepsilon_{n}$ of its innovation process:

[TABLE]

where

[TABLE]

Here, under Assumptions A1–A2, for $C_{\varepsilon}={\rm E}\left\{\varepsilon_{n}\otimes\varepsilon_{n}\right\}={\rm E}\left\{\varepsilon_{0}\otimes\varepsilon_{0}\right\},\leavevmode\nobreak\ n\in\mathbb{Z},$

[TABLE]

where, as before, $\left\{\phi_{j},\ j\geq 1\right\}$ denotes the system of eigenvectors of the autocovariance operator $C,$ and

[TABLE]

for all $n\in\mathbb{Z}.$

The following lemma provides the convergence, in the seminorm of $\mathcal{L}_{H}^{2}(\Omega,\mathcal{A},\mathcal{P}),$ of the series expansions (5)–(6).

Lemma 1

Let $X=\left\{X_{n},\ n\in\mathbb{Z}\right\}$ be a zero–mean ARH(1) process. Under Assumptions A1–A2, for any $n\in\mathbb{Z},$ the following limit holds

[TABLE]

where $\widehat{X}_{n,M}=\displaystyle\sum_{j=1}^{M}\sqrt{C_{j}}\eta_{j}(n)\phi_{j}$ . Furthermore,

[TABLE]

Similar assertions hold for the biorthogonal series representation

[TABLE]

Proof.

Under Assumption A1, from the trace property of $C,$ the sequence

[TABLE]

satisfies, for $M$ sufficiently large, and $L>0,$ arbitrary,

[TABLE]

since, under Assumption A1, $\displaystyle\sum_{j=1}^{\infty}C_{j}<\infty$ . Hence, $\left\{\displaystyle\sum_{j=1}^{M}C_{j},\ M\geq 1\right\}$ is a Cauchy sequence. Thus,

[TABLE]

for $L>0$ arbitrary. From equation (7),

[TABLE]

is also a Cauchy sequence in $\mathcal{L}_{H}^{2}(\Omega,\mathcal{A},P).$ Thus, the sequence $\left\{\widehat{X}_{n,M},\ M\geq 1\right\}$ has finite limit in $\mathcal{L}_{H}^{2}(\Omega,\mathcal{A},\mathcal{P})$ , for all $n\in\mathbb{Z}$ .

Furthermore,

[TABLE]

In the derivation of the identities in (7)–(LABEL:A3:19), we have applied that, for every $j,\leavevmode\nobreak\ h\geq 1,$

[TABLE]

Moreover, from identities in (LABEL:A3:20),

[TABLE]

In a similar way, we can derive the convergence to $\varepsilon_{n},$ in $\mathcal{L}_{H}^{2}(\Omega,\mathcal{A},\mathcal{P}),$ of the series $\displaystyle\sum_{j=1}^{\infty}\sigma_{j}\widetilde{\eta}_{j}(n)\phi_{j},$ for every $n\in\mathbb{Z},$ since $\varepsilon$ is assumed to be strong–white noise, and hence, its covariance operator $C_{\varepsilon}$ is in the trace class. We can also obtain an analogous to equation (10).

$\blacksquare$

In equations (5)–(6), for every $n\in\mathbb{Z},$

[TABLE]

Note that, from Assumption A2 for each $j\geq 1,$ $\left\{X_{n,j},\ n\in\mathbb{Z}\right\}$ in equation (2) defines a stationary and invertible AR(1) process. In addition, from equations (5) and (LABEL:A3:20), for every $n\in\mathbb{Z},$ and $j,p\geq 1,$

[TABLE]

which implies that

[TABLE]

In particular, we obtain, for each $j\geq 1,$ and for every $n\in\mathbb{Z},$

[TABLE]

Remark 3

From equation (2) and Lemma 1, keeping in mind that

[TABLE]

the following invertible and stationary AR(1) process can be defined:

[TABLE]

where, for each $j\geq 1,$ $\left\{\eta_{j}(n),\ n\in\mathbb{Z}\right\}$ and $\left\{\widetilde{\eta}_{j}(n),\ n\in\mathbb{Z}\right\}$ are respectively introduced in equations (5)-(6). In the following, for each $j\geq 1,$ we assume that

[TABLE]

to ensure ergodicity for all second–order moments, in the mean–square sense; see, e.g., [Hamilton, 1994, pp. 192–193].

Furthermore,

[TABLE]

Remark 4

In particular, Assumption A2 holds if the following orthogonality condition is satisfied, for all $n\in\mathbb{Z}$ and $j,p\geq 1$ ,

[TABLE]

where $\delta_{j,p}$ denotes the Kronecker Delta function. In practice, unconditional bases, e.g., wavelet bases, lead to a sparse representation for functional data; see, e.g., Nason [2008], Ogden [1997], Vidakovic [1998] for statistically-oriented treatments. Wavelet bases are also designed for sparse representation of kernels defining integral operators, in $L^{2}$ spaces with respect to a suitable measure (see Mallat [2009]). The Discrete Wavelet Transform (DWT) approximately decorrelates or whitens data (see Vidakovic [1998]). In particular, operators $C$ and $D$ could admit an almost diagonal representation with respect to the self-tensorial tensorial product of a suitable wavelet basis.

3 Estimation and prediction results

A componentwise estimator of the autocorrelation operator and of the associated ARH(1) plug–in predictor are formulated in this section. Their convergence to the corresponding theoretical functional values are derived in the spaces $\mathcal{L}^{2}_{\mathcal{S}(H)}(\Omega,\mathcal{A},\mathcal{P})$ and $\mathcal{L}_{H}(\Omega,\mathcal{A},\mathcal{P}),$ respectively. Their consistency in the spaces $\mathcal{S}(H)$ and $H$ then follows.

From equation (3), for each $j\geq 1,$ and for a given sample size $n$ , one can consider the usual respective moment–based estimators $\widehat{D}_{n,j}$ and $\widehat{C}_{n,j}$ of $D_{j}$ and $C_{j},$ in the AR(1) framework, given by

[TABLE]

The following truncated componentwise estimator of $\rho$ is then formulated:

[TABLE]

where, for each $j\geq 1,$

[TABLE]

Here, the truncation parameter indicates that we have considered the first $k_{n}$ eigenvectors associated with the first $k_{n}$ eigenvalues, arranged in decreasing order of their modulus magnitude. Furthermore, $k_{n}$ is such that

[TABLE]

The following additional condition will be assumed on $k_{n}$ for the derivation of the subsequent results:

Assumption A3. The truncation parameter $k_{n}$ in (15) is such that

[TABLE]

Remark 5

Assumption A3* has also been considered in [Bosq, 2000, p. 217], to ensure weak consistency of the proposed estimator of $\rho,$ as well as, in [Mas, 1999, Proposition 4], in the derivation of asymptotic normality.*

From Remark 3, for each $j\geq 1,$ $\eta_{j}=\left\{\eta_{j}(n),\ n\in\mathbb{Z}\right\}$ in equation (14) defines a stationary and invertible AR(1) process, ergodic in the mean–square sense; see, e.g., Bartlett [1946]. Therefore, in view of equations (11) and (13), for each $j\geq 1$ , there exist two positive constants $K_{j,1}$ and $K_{j,2}$ such that the following identities hold:

[TABLE]

Equations (18)-(19) imply, for $n$ sufficiently large,

[TABLE]

for certain positive constants $\widetilde{K}_{j,1}$ and $\widetilde{K}_{j,2},$ for each $j\geq 1.$ Equivalently, for $n$ sufficiently large,

[TABLE]

The following assumption is now considered.

Assumption A4. We assume that

[TABLE]

Remark 6

From equation (16), applying the Cauchy–Schwarz’s inequality, we obtain, for each $j\geq 1$ ,

[TABLE]

Convergence in $\mathcal{L}_{\mathcal{S}(H)}^{2}\left(\Omega,\mathcal{A},\mathcal{P}\right)$

Next, the convergence of $\widehat{\rho}_{k_{n}}$ to $\rho,$ in the space $\mathcal{L}_{\mathcal{S}\left(H\right)}^{2}\left(\Omega,\mathcal{A},\mathcal{P}\right),$ is derived under the setting of conditions formulated in the previous sections.

Proposition 1

Let $X=\left\{X_{n},\ n\in\mathbb{Z}\right\}$ be a zero–mean standard ARH(1) process. Under Assumptions A1–A4, the following limit holds:

[TABLE]

Specifically,

[TABLE]

Remark 7

[Bosq, 2000, Corollary 4.3]* can be applied to obtain weak convergence results, in terms of weak expectation, using the empirical eigenvectors . See definition of weak expectation at the beginning of [Bosq, 2000, Section 1.3, p. 27]).*

Proof. For each $j\geq 1,$ the following almost surely inequality is satisfied:

[TABLE]

Thus, under Assumptions A1–A2, from equation (24), for each $j\geq 1,$

[TABLE]

which implies, for each $j\geq 1$ ,

[TABLE]

Under Assumption A2, from equations (15) and (27),

[TABLE]

Furthermore, from (5) and (16), for each $j\geq 1$ ,

[TABLE]

where, considering equation (4),

[TABLE]

for each $j\geq 1.$ Equations (28)–(31) then lead to

[TABLE]

For each $j\geq 1,$ and for $n$ sufficiently large, considering equations (22)–(23), under Assumption A4,

[TABLE]

From the trace property of operator $C,$

[TABLE]

and from the Hilbert–Schmidt property of $\rho,$

[TABLE]

Thus, in view of equations (LABEL:A3:49)–(34),

[TABLE]

where

[TABLE]

Under Assumption A3, equations (LABEL:A3:50)–(36) imply

[TABLE]

as we wanted to prove.

$\blacksquare$

Note that consistency of $\widehat{\rho}_{k_{n}}$ in the space $\mathcal{S}\left(H\right)$ directly follows from equation (25) in Proposition 1.

Corollary 1

Let $X=\left\{X_{n},\ n\in\mathbb{Z}\right\}$ be a zero–mean standard ARH(1) process. Under Assumptions A1–A4, as long as $n\rightarrow\infty,$

[TABLE]

where, as usual, $\to^{p}$ denotes the convergence in probability.

Consistency of the ARH(1) plug–in predictor.

Let us consider $\mathcal{L}\left(H\right)$ the space of bounded linear operators on $H,$ with the norm

[TABLE]

for every $\mathcal{A}\in\mathcal{L}\left(H\right).$ In particular, for each $x\in H,$

[TABLE]

In the following, we denote by

[TABLE]

as usual, the ARH(1) plug–in predictor of $X_{n},$ as an estimator of the conditional expectation ${\rm E}\left\{X_{n}|X_{n-1}\right\}=\rho\left(X_{n-1}\right)$ . The following proposition provides the consistency of $\widehat{X}_{n}=\widehat{\rho}_{k_{n}}\left(X_{n-1}\right)$ in $H$ .

Proposition 2

Let $X=\left\{X_{n},\ n\in\mathbb{Z}\right\}$ be a zero–mean standard ARH(1) process. Under Assumptions A1–A4,

[TABLE]

Specifically,

[TABLE]

In particular,

[TABLE]

where, as usual, $\to^{p}$ denotes the convergence in probability.

Proof.

From (37) and Proposition 1, for $n$ sufficiently large, the following almost surely inequality holds:

[TABLE]

where, as given in equation (38), $\widehat{X}_{n}=\widehat{\rho}_{k_{n}}\left(X_{n-1}\right).$ Thus,

[TABLE]

From the Cauchy-Schwarz’s inequality, keeping in mind that, for a Hilbert–Schmidt operator $\mathcal{K},$ it always holds that $\|\mathcal{K}\|_{\mathcal{L}\left(H\right)}\leq\|\mathcal{K}\|_{\mathcal{S}\left(H\right)},$ we have from equation (39),

[TABLE]

where, as before, $\sigma_{X}^{2}={\rm E}\left\{\left\|X_{n-1}\right\|_{H}^{2}\right\}=\displaystyle\sum_{j=1}^{\infty}C_{j}<\infty,\quad n\in\mathbb{Z}$ (see equation (LABEL:A3:20)).

Since from Proposition 1 (see equation (26)),

[TABLE]

from equation (40), we obtain,

[TABLE]

where $h\left(n\right)=\sigma_{X}\sqrt{g\left(n\right)}$ , with $g\left(n\right)$ being given in (36). In particular, under Assumption A3,

[TABLE]

which implies that

[TABLE]

$\blacksquare$

4 The Gaussian case

In this section, we prove that, in the Gaussian ARH(1) context, Assumptions A1–A2 and A4 also hold. From equation (11), for $n\geq 1$ ,

[TABLE]

Furthermore, for each $j\geq 1$ and $n\geq 2$ , the $n\times 1$ random vector $\boldsymbol{\eta}_{j}^{T}=\left(\eta_{j}(0),\dots,\eta_{j}(n-1)\right)$ follows a Multivariate Normal distribution with null mean vector, and covariance matrix

[TABLE]

It is well–known (see, for example, Gurland [1956]) that the variance of a quadratic form defined from a multivariate Gaussian vector $\mathbf{y}\sim N(\boldsymbol{\mu},\boldsymbol{\Lambda}),$ and a symmetric matrix $\mathbf{Q}$ is given by:

[TABLE]

For each $j\geq 1,$ applying equation (42), with $\mathbf{y}=\boldsymbol{\eta}_{j},$ $\boldsymbol{\Lambda=\Sigma}$ in (41), and $\boldsymbol{Q=Id}_{n},$ the $n\times n$ identity matrix, keeping in mind ${\rm E}\left\{\eta_{j}(i)\eta_{j}(i+1)\right\}=\rho_{j}$ , for every $i\in\mathbb{Z}$ ,

[TABLE]

Furthermore, from equation (LABEL:A3:mvqf4), for each $j\geq 1,$

[TABLE]

We then obtain, from equation (44),

[TABLE]

Equation (45) leads to

[TABLE]

Hence, for each $j\geq 1,$ $K_{j,1}$ in equation (18) is given by

[TABLE]

and, from equation (44),

[TABLE]

Thus, for every $j\geq 1,$ $\widetilde{K}_{j,1}$ in equation (20) satisfies

[TABLE]

Remark 8

Note that, from Lemma 1, for each $j\geq 1$ and $i\in\mathbb{Z}$ ,

[TABLE]

Thus, the assumption considered in Remark 3 holds, and for each $j\geq 1,$ the AR(1) process $\eta_{j}=\left\{\eta_{j}(n),\ n\in\mathbb{Z}\right\}$ is ergodic for all second–order moments, in the mean–square sense; see [Hamilton, 1994, pp. 192–193].

For $n\geq 2,$ and for each $j\geq 1,$ we are now going to compute $K_{j,2}$ in (19). The $(n-1)\times 1$ random vectors

[TABLE]

are multivariate Normal distributed, with null mean vector, and covariance matrix

[TABLE]

From equation (13), for each $j\geq 1,$

[TABLE]

where

[TABLE]

with, as before, $\boldsymbol{Id}_{n-1}$ denoting the $(n-1)\times(n-1)$ identity matrix.

However, the variance of

[TABLE]

depends greatly on the distribution of $\boldsymbol{\eta}_{j}^{\star}$ and $\boldsymbol{\eta}_{j}^{\star\star}.$ In the Gaussian case, keeping in mind that

[TABLE]

are zero–mean multivariate Normal distributed vectors with covariance matrix $\boldsymbol{\widetilde{\Sigma}}$ given in (46), and having cross–covariance matrix in (48), we can compute the variance of $\displaystyle\sum_{i=0}^{n-2}\eta_{j}(i)\eta_{j}(i+1),$ from (47)–(48), as follows. First,

[TABLE]

This can be rewritten as

[TABLE]

which is equal to

[TABLE]

This then reduces to

[TABLE]

which is the same as

[TABLE]

where, from (48),

[TABLE]

From (LABEL:A3:extversionvv3),

[TABLE]

Therefore, for each $j\geq 1,$

[TABLE]

Thus, for each $j\geq 1,$ $K_{j,2}$ in (19) is given by $K_{j,2}=1+3\rho_{j}^{2}.$ From equation (50),

[TABLE]

Hence, for every $j\geq 1,$ $\widetilde{K}_{j,2}$ in equation (21) satisfies

[TABLE]

Therefore, the constant $S$ in Assumption A4 is such that $S\leq 6+4=10.$

5 Simulation study

A simulation study is undertaken to illustrate the behaviour of the formulated componentwise estimator of the autocorrelation operator, and of its associated ARH(1) plug–in predictor for large sample sizes. The results are reported in Appendix 5.1. In Appendix 5.2, a comparative study is developed, from the implementation of the ARH(1) plug–in prediction techniques proposed in Antoniadis and Sapatinas [2003], Besse et al. [2000], Bosq [2000], Guillas [2001]. In the subsequent sections, we restrict our attention to the Gaussian case

Behaviour of $\widehat{\rho}$ and $\widehat{X}_{n}$ for large sample sizes

Let $(-\Delta)_{(a,b)}$ be the Dirichlet negative Laplacian operator on $(a,b)$ given by

[TABLE]

The eigenvectors $\left\{\phi_{j},\ j\geq 1\right\}$ and eigenvalues $\left\{\lambda_{j}\left((-\Delta)_{(a,b)}\right),\ j\geq 1\right\}$ of $(-\Delta)_{(a,b)}$ satisfy, for each $j\geq 1$ and for each $x\in(a,b)$ ,

[TABLE]

For each $j\geq 1$ and $x\in\left[a,b\right]$ , the solution to equation (51) is given by (see [Grebenkov and Nguyen, 2013, p. 6]):

[TABLE]

We consider here the operator $C$ defined as

[TABLE]

From [Dautray and Lions, 1990, pp. 119–140], the eigenvectors of $C$ coincide with the eigenvectors of $(-\Delta)_{(a,b)},$ and its eigenvalues $\left\{C_{j},\ j\geq 1\right\}$ are given by:

[TABLE]

Additionally, considering

[TABLE]

for certain positive constant $\epsilon<\lambda_{1}\left((-\Delta)_{(a,b)}\right)$ close to zero, $\rho$ is a positive self–adjoint Hilbert–Schmidt operator, whose eigenvectors coincide with the eigenvectors of $\left(-\Delta\right)_{(a,b)},$ and whose eigenvalues $\left\{\rho_{j},\leavevmode\nobreak\ j\geq 1\right\}$ are such that $\rho_{j}<1,$ for every $j\geq 1,$ and

[TABLE]

where, as before, $\left\{\lambda_{j}\left((-\Delta)_{(a,b)}\right),\ j\geq 1\right\}$ are given in equation (52).

From (LABEL:A3:25a), the eigenvalues $\left\{\sigma_{j}^{2},\ j\geq 1\right\}$ of $C_{\varepsilon}$ are then defined, for each $j\geq 1,$ as

[TABLE]

Note that $C_{\varepsilon}$ is in the trace class, since the trace property of $C,$ and the fact that $\rho_{j}^{2}<1,$ for every $j\geq 1,$ implies

[TABLE]

For this particular example of operator $C,$ we have considered truncation parameter $k_{n}$ of the form

[TABLE]

for a suitable $\alpha>0,$ which, in particular, allows verification of (17). From equation (53), one has, for $\gamma_{1}\in(0,1/2)$ ,

[TABLE]

From equation (55), Assumption A3 is then satisfied if

[TABLE]

since $\gamma_{1}\in(0,1/2)$ . Fix $\gamma_{1}=0.4$ and $\gamma_{2}=9/20$ . Then, from equation (56), $\alpha>48/10.$ In particular, the values $\alpha_{1}=5$ and $\alpha_{2}=6$ have been tested, in Table 1 below, for $H=L^{2}((a,b)),$ and $(a,b)=(0,4),$ where $L^{2}((a,b))$ denotes the space of square integrable functions on $(a,b).$

The computed empirical truncated functional mean square error ${\rm EMSE}_{\widehat{\rho}_{k_{n}}}$ of the estimator $\widehat{\rho}_{k_{n}}$ of $\rho,$ for a sample size $n$ , is given by:

[TABLE]

where $N$ denotes the number of simulations, and for each $j=1,\dots,k_{n},$ $\widehat{\rho}_{n,j}^{w}$ represents the estimator of $\rho_{j},$ based on the $w$ –th generation of the values $X_{0,j}^{w},\dots,X_{n-1,j}^{w},$ with $X_{i,j}^{w}=\left\langle X_{i}^{w},\phi_{j}\right\rangle_{H},$ for $w=1,\dots,700,$ and $i=0,\dots,n-1.$

For the plug–in predictor $\widehat{X}_{n}=\widehat{\rho}_{k_{n}}\left(X_{n-1}\right),$ we compute the empirical version ${\rm UB(EMAE)}_{\widehat{X}_{n}^{k_{n}}}$ of the derived upper bound (40), which, for each $n\in\mathbb{Z},$ is given by

[TABLE]

From $N=700$ realizations, for each one of the elements of the sequence of sample sizes

[TABLE]

the ${\rm EMSE}_{\widehat{\rho}_{k_{n}}}$ and ${\rm UB(EMAE)}_{\widehat{X}_{n}^{k_{n}}}$ values, for $\alpha=5$ and $\alpha=6,$ are displayed in Table 1, where the abbreviated notations ${\rm MSE}_{\widehat{\rho}_{k_{n,1}}},$ for ${\rm EMSE}_{\widehat{\rho}_{k_{n}}},$ and ${\rm UB}_{\widehat{X}_{n^{k_{n,1}}}},$ for ${\rm UB(EMAE)}_{\widehat{X}_{n}^{k_{n}}},$ are used (see also Figures 1–2).

In this paper, a one–parameter model of $k_{n}$ is selected depending on parameter $\alpha$ . In [Guillas, 2001, Example 2], in the same spirit, for an equivalent spectral class of operators $C$ , a three–parameter model is established for $k_{n}$ to ensure convergence in quadratic mean in the space $\mathcal{L}(H)$ of the componentwise estimator of $\rho$ constructed from the known eigenvectors of $C$ . The numerical results displayed in Table 1 and Figures 1–2 illustrate the fact that the proposed componentwise estimator $\widehat{\rho}_{k_{n}}$ presents a speed of convergence to $\rho,$ in quadratic mean in $S(H),$ faster than $n^{-1/3},$ which corresponds to the optimal case for the componentwise estimator of $\rho$ proposed in Guillas [2001], in the case of known eigenvectors of $C$ ; see, in particular, [Guillas, 2001, Theorem 1, Remark 2 and Example 2]. For larger values of the parameters $\gamma_{1}$ than $2.4,$ and $\alpha$ than $6$ , a faster velocity of convergence of $\widehat{\rho}_{k_{n}}$ to $\rho,$ in quadratic mean in the space $S(H),$ will be obtained. However, larger sample sizes are required for larger values of $\alpha,$ in order to estimate a given number of coefficients of $\rho.$ A more detailed discussion about comparison of the rates of convergence of the ARH(1) plug–in predictors proposed in Antoniadis and Sapatinas [2003], Besse et al. [2000], Bosq [2000], Guillas [2001] can be found in the next section.

A comparative study

In this section, the performance of our approach is compared with those ones given in Antoniadis and Sapatinas [2003], Besse et al. [2000], Bosq [2000], Guillas [2001], including the case of unknown eigenvectors of $C.$ In the last case, our approach and the approaches presented in Bosq [2000], Guillas [2001] are implemented in terms of the empirical eigenvectors.

5.2.1 Theoretical–eigenvector–based componentwise estimators

Let us first compare the performance of our ARH(1) plug–in predictor, defined in (38), and the ones formulated in Bosq [2000], Guillas [2001], in terms of the theoretical eigenvectors $\left\{\phi_{j},\ j\geq 1\right\}$ of $C.$ Note that, in this first part of our comparative study, we consider the previous generated Gaussian ARH(1) process, with autocovariance and autocorrelation operators defined from equations (53) and (54), for different rates of convergence to zero of parameters $C_{j}$ and $\rho_{j}^{2},$ $j\geq 1,$ with both sequences being summable sequences. Since we restrict our attention to the Gaussian case, conditions A ${}_{1},$ B1 and C ${}_{1},$ formulated in [Bosq, 2000, pp. 211–212] are satisfied by the generated ARH(1) process. Similarly, Conditions H1–H3 in [Guillas, 2001, p. 283] are satisfied as well.

In [Bosq, 2000, Section 8.2] the following estimator of $\rho$ is proposed

[TABLE]

in the finite dimensional subspace

[TABLE]

of $H,$ where $\Pi^{k_{n}}$ is the orthogonal projector over $H_{k_{n}},$ and, as before, $X_{i,j}=\left\langle X_{i},\phi_{j}\right\rangle_{H},$ for $j\geq 1.$

A modified estimator of $\rho$ is studied in [Guillas, 2001, Section 2], given by

[TABLE]

where

[TABLE]

Here, $\left\{a_{n},\ n\in\mathbb{N}\right\}$ is such that (see [Guillas, 2001, Theorem 1])

[TABLE]

Tables 2–3 display the truncated, for two different $k_{n}$ rules, empirical values of ${\rm E}\left\{\|\rho\left(X_{n-1}\right)-\widehat{\rho}_{k_{n}}(X_{n-1})\|_{H}\right\},$ based on $N=700$ generations of each one of the functional samples considered with sizes $n_{t}=15000+20000(t-1),$ $t=1,\dots,20,$ when

[TABLE]

Specifically, $\widehat{\rho}_{k_{n}}$ is computed from equations (15)–(16) (see third column), $\widehat{\rho}_{k_{n}}=\widehat{\rho}_{n}$ , with $\widehat{\rho}_{n}$ being given in equations (60)–(61) (see fourth column), and $\widehat{\rho}_{k_{n}}=\widehat{\rho}_{n,a}$ , with $\widehat{\rho}_{n,a}$ being defined in (62)–(63) (see fifth column).

In Table 2, $\delta_{1}=2.4$ $\delta_{2}=1.1,$ and $k_{n}=\lceil n^{1/\alpha}\rceil,$ for $\alpha=6,$ according to our Assumption A3, which is also considered in [Bosq, 2000, p. 217] to ensure weak consistency of the proposed estimator of $\rho$ . In Table 3, the same empirical values are displayed for $\delta_{1}=\frac{61}{60},$ $\delta_{2}=1.1,$ and $k_{n}$ is selected according to [Guillas, 2001, Example 2]. Thus, in Table 3,

[TABLE]

In particular we have chosen $\gamma=2,$ and $\epsilon=0.04\delta_{1}.$ Note that, from [Guillas, 2001, Theorem 1 and Remark 1], for the choice made of $k_{n}$ in Table 3, convergence to $\rho,$ in quadratic mean in the space $\mathcal{L}(H),$ holds for $\widehat{\rho}_{n,a}$ given in (62)–(63).

One can observe in Table 2 a similar performance of the three methods compared with the truncation order kn satisfying Assumption A3, with slightly worse results being obtained from the estimator defined in (62)–(63), specially, for the sample size $n_{8}=155000.$ Furthermore, in Table 3, a better performance of our approach is observed for the smallest sample sizes (from $n_{1}=15000$ until $n_{4}=75000$ ). For the remaining largest sample sizes, only slight differences are observed, with, again, a better performance of our approach, very close to the other two approaches presented in Bosq [2000], Guillas [2001].

5.2.2 Empirical–eigenvector–based componentwise estimators

In this section, we address the case where $\left\{\phi_{j},\ j\geq 1\right\}$ are unknown, as is often the case in practice. Specifically, for a given sample size $n$ , let $\left\{\phi_{n,j},\ j\geq 1\right\}$ be the empirical counterpart of the theoretical eigenvectors $\left\{\phi_{j},\ j\geq 1\right\}$ , satisfying, for every $j\geq 1$ ,

[TABLE]

where $\left\{C_{n,j},\ j\geq 1\right\}$ denotes the system of eigenvalues associated with the system of empirical eigenvectors $\left\{\phi_{n,j},\ j\geq 1\right\}$ . We then consider the following estimators for comparison purposes

[TABLE]

where, for $i\in\mathbb{Z},$ and $j\geq 1,$ $\widetilde{X}_{i,j}=\left\langle X_{i},\phi_{n,j}\right\rangle_{H},$ $\widetilde{\Pi}^{k_{n}}$ denotes the orthogonal projector into the space

[TABLE]

The Gaussian ARH(1) process is generated under Assumptions A1–A2, as well as $C_{1}^{\prime}$ in [Bosq, 2000, p. 218]. Note that conditions $A_{1}$ and $B_{1}^{\prime}$ in Bosq [2000] already hold. Moreover, as given in [Bosq, 2000, Theorem 8.8 and Example 8.6], for

[TABLE]

with, in particular, $\delta_{1}=2.4,$ and for

[TABLE]

with $\delta_{2}=1.1,$ , the estimator $\widetilde{\rho}_{n}$ converges almost surely to $\rho$ under the condition

[TABLE]

where

[TABLE]

In Table 4, $k_{n}=\lceil\ln(n)\rceil$ has been tested; see [Bosq, 2000, Example 8.6].

A better performance of our estimator (65) in comparison with estimator (66), formulated in Bosq [2000], and estimator (67), formulated in [Guillas, 2001, Example 4 and Remark 4], is observed in Table 4. Note that, in particular, in [Guillas, 2001, Example 4 and Remark 4], smaller values of $k_{n}$ than $\ln(n)$ are required for a given sample size $n,$ to ensure convergence in quadratic mean, and, in particular, weak–consistency. However, considering a smaller discretization step size $\Delta t=0.015$ than in Table 4, where $\Delta t=0.08$ , and for $k_{n}=\lceil n^{1/6}\rceil,$ (i.e., $\alpha=6$ ), we obtain in Table 5, for the same parameter values $\delta_{1}=2.4$ and $\delta_{2}=1.1,$ better results than in Table 4, since a smaller number of coefficients of $\rho$ (parameters) to be estimated is considered in Table 5, from a richer sample information (coming from the smaller discretization step size considered). One can also observe in Table 5 a similar performance of the three approaches studied. In Table 6, the value $k_{n}=\lceil e^{\prime}n^{1/\left(8\delta_{1}+2\right)}\rceil$ , with $e^{\prime}=\frac{17}{10}$ proposed in [Guillas, 2001, Example 4 and Remark 4] is considered to compute the truncated empirical values of ${\rm E}\left\{\|\rho(X_{n-1})-\widetilde{\rho}_{k_{n}}(X_{n-1})\|_{H}\right\},$ for $\widetilde{\rho}_{k_{n}}$ defined in equation (65) (third column), for $\widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{n}$ given in equation (66) (fourth column), and for $\widetilde{\rho}_{k_{n}}=\widetilde{\rho}_{n,a}$ in equation (67) (fifth column). A similar performance of the three approaches is observed, with the exception of $n_{20}=395000,$ where the approach presented in Guillas [2001] displays a slightly better performance

5.2.3 Kernel–based nonparametric and penalized estimation

In practice, curves are observed in discrete times, and should be approximated by smooth functions. In Besse et al. [2000], the following optimization problem is considered:

[TABLE]

where $L$ is a linear differential operator of order $d.$ Our interpolation is computed by Matlab smoothingspline method. Non-linear kernel regression is then considered, in terms of the smoothed functional data, solution to (68), as follows:

[TABLE]

where $K$ is the usual Gaussian kernel, and

[TABLE]

Alternatively, in Besse et al. [2000], prediction, in the context of functional autoregressive processes (FAR(1) processes), under the linear assumption on $\rho,$ which is considered to be a compact operator, with $\|\rho\|<1,$ is also studied, from smooth data $\widehat{X}_{1},\dots,\widehat{X}_{n},$ solving the optimization problem

[TABLE]

where $l$ is the smoothing parameter, $H_{q}$ is the $q$ –dimensional functional subspace spanned by the leading eigenvectors of the autocovariance operator $C$ associated with its largest eigenvalues. Thus, smoothness and rank constraint are considered in the computation of the solution to the optimization problem (69). Such a solution is obtained by means of functional PCA.

The following regularized empirical estimators of $C$ and $D$ are then considered, with inversion of $C$ in the subspace $H_{q}$ :

[TABLE]

Thus, the regularized estimator of $\rho$ is given by

[TABLE]

and the predictor

[TABLE]

Due to computational cost limitations, in Table 7, the following statistics are evaluated to compare the performance of the two above-referred prediction methodologies:

[TABLE]

It can be observed a similar performance of the kernel–based and penalized FAR(1) predictors, from smooth functional data, which is also comparable, considering one realization, to the performance obtained in Table 6, from the empirical eigenvectors.

5.2.4 Wavelet–based prediction for ARH(1) processes

The approach presented in Antoniadis and Sapatinas [2003] is now studied. Specifically, wavelet-based regularization is applied to obtain smooth estimates of the sample paths. The projection onto the space $V_{J},$ generated by translations of the scaling function $\phi_{Jk},\ k=0,\dots,2^{J}-1,$ at level $J,$ associated with a multiresolution analysis of $H,$ is first considered. For a given primary resolution level $j_{0}$ , with $j_{0}<J,$ the following wavelet decomposition at $J-j_{0}$ resolution levels can be computed for any projected curve $\Phi_{V_{J}}X_{i},$ in the space $V_{J},$ for $i=0,\dots,n-1:$

[TABLE]

For $i=0,\dots,n-1,$ the following variational problem is solved to obtain the smooth estimate of the curve $X_{i}:$

[TABLE]

where $\Phi_{V_{j_{0}}^{\bot}}$ denotes the orthogonal projection operator of $H$ onto the orhogonal complement of $V_{j_{0}},$ and for $i=0,1\dots n-1,$

[TABLE]

Using the equivalent sequence of norms of fractional Sobolev spaces of order $s$ with $s>1/2,$ on a suitable interval (in our case, $s=\delta_{1}$ ), the minimization of (72) is equivalent to the optimization problem, for $i=0,\dots,n-1,$

[TABLE]

The solution to (73) is given by, for $i=0,\dots,n-1,$

[TABLE]

In particular, in the subsequent computations, we have considered the following value of the smoothing parameter $\lambda$ (see Angelini et al. [2003]):

[TABLE]

The following smoothed data are then computed

[TABLE]

removing the trend

[TABLE]

to obtain

[TABLE]

for the computation of

[TABLE]

for $x\in H$ and

[TABLE]

where

[TABLE]

and

[TABLE]

for every $j\geq 1$ . Table 8 displays the empirical truncated approximation of the expectation ${\rm E}\left\{\|\widetilde{\rho}_{k_{n}}(X_{n-1})-\rho(X_{n-1})\|_{H}\right\}$ and ${\rm E}\left\{\|\widetilde{\rho}_{n,\widehat{\lambda}^{M}}(X_{n-1})-\rho(X_{n-1})\|_{H}\right\},$ respectively obtained applying our approach, and the approach in Antoniadis and Sapatinas [2003], in the estimation of the autocorrelation operator $\rho$ . Here, we have tested $k_{n_{i}}=\lceil n^{1/\alpha_{i}}\rceil,$ $i=1,2,$ with $\alpha_{1}=6,$ according to Assumption A3, and $\alpha_{2}>4\delta_{1},$ according to

[TABLE]

in [Antoniadis and Sapatinas, 2003, p. 149]. In particular, we have considered $\delta_{1}=2.4,$ and $\alpha_{2}=10.$ From the results displayed in Table 8, one can observe a similar performance for the two truncation rules implemented, and approaches compared, for the small sample sizes tested. A similar accuracy is also displayed by the approaches presented in Besse et al. [2000], for such small sample sizes (see Table 7).

6 Final comments

As noted before, in this paper, the eigenvectors of $C$ are considered to be known in the derivation of the results on consistency. This assumption is satisfied, e.g., when the random initial condition is given as the solution, in the mean-square sense, of a stochastic differential equation driven by white noise (e.g., the Wiener measure), since the eigenvectors of the differential operator involved in that equation coincide with the eigenvectors of the autocovariance operator of the ARH(1) process. In the case where the eigenvectors of the autocovariance operator are unknown, the numerical results displayed in Tables 4–6 illustrate the fact that our approach displays, in terms of the empirical eigenvectors, very similar prediction results to those obtained with the implementation of the componentwise estimators proposed in Bosq [2000], Guillas [2001], with a better performance of our approach in the more unfavorable case, corresponding to a large discretization step size, and truncation order (see Table 4 computed for $k_{n}=\lceil\ln(n)\rceil$ ).

Regarding Assumption A2, Remark 1 provides an example where this assumption is satisfied. However, our approach can still be applied in a wider range of situations. Wavelet bases are well suited for sparse representation of functions; recent work has considered combining them with sparsity-inducing penalties, both for semiparametric regression (see, e.g., Wand and Ormerod [2011]), and for regression with functional or kernel predictors (see Wand and Ormerod [2011], Zhao et al. [2012, 2015], among others). The latter papers focused on $\ell_{1}$ penalization, also known as the lasso (see Tibshirani [1996]), in the wavelet domain. Alternatives to the lasso include the SCAD penalty by Fan and Li [2001], and the adaptive lasso by Zou [2006]. The $\ell_{1}$ penalty in the elastic net criterion has the effect of shrinking small coefficients to zero. This can be interpreted as imposing a prior that favors a sparse estimate. The above mentioned smoothing techniques, based on wavelets, can be applied to obtain a smooth sparse approximation $\widehat{X}_{1},\dots\widehat{X}_{n}$ of the functional data $X_{1},\dots,X_{n},$ whose empirical auto-covariance operator

[TABLE]

and cross-covariance operator

[TABLE]

admits a diagonal representation in terms of wavelets.

In the literature, shrinkage approaches for estimating a high–dimensional covariance matrix are employed to circumvent the limitations of the sample covariance matrix. In particular, a new family of nonparametric Stein–type shrinkage covariance estimators is proposed in Touloumis [2015] (see also references therein), whose members are written as a convex linear combination of the sample covariance matrix and of a predefined invertible diagonal target matrix. These results can be applied to our framework, considering the shrinkage estimators of the autocovariance and cross-covariance operators, with respect to a common suitable wavelet basis, which can lead to an empirical diagonal representation of both operators.

In the Supplementary Material provided (see Appendix 7), a numerical example is provided to illustrate the performance of our approach, in the case of a pseudo–diagonal autocorrelation operator.

7 Supplementary Material: non–diagonal autocorrelation operator

This Section provides as a numerical example where the methodology proposed in such paper still works beyond the considered Assumption A2. In particular, this section illustrates the performance of the proposed estimation methodology, when Assumption A2 is not satisfied, but $\rho$ is close to be diagonal in some sense. The numerical results obtained are compared to those ones derived from the computation of the ARH(1) predictors, based on the componentwise estimators proposed in Bosq [2000], Guillas [2001] where this diagonal assumption is not required. The Gaussian ARH(1) process generated has autocorrelation operator $\rho$ with coefficients $\rho_{j,h}$ with respect to the basis $\left\{\phi_{j}\otimes\phi_{h},\ j,h\geq 1\right\},$ given by

[TABLE]

in the diagonal, and outside of the diagonal

[TABLE]

where $\rho_{j,j+a}^{2}=\rho_{j+a,j}^{2}=0$ when $a\geq 6$ . The coefficients of the autocovariance operator $C_{\varepsilon}$ of the innovation process $\varepsilon,$ with respect to the mentioned basis $\left\{\phi_{j}\otimes\phi_{h},\ j,h\geq 1\right\},$ are given by

[TABLE]

in the diagonal, and outside of the diagonal by

[TABLE]

where $\sigma_{j,j+a}^{2}=\sigma_{j+a,j}^{2}=0$ when $a\geq 6$ . Table 9 below displays the empirical truncated values of ${\rm E}\left\{\left\|\rho(X_{n-1})-\widehat{\rho}_{k_{n}}^{ND}(X_{n-1})\right\|_{H}\right\}$ based on $N=200$ simulations of each one of the 20 functional samples considered, with sizes $\left\{n_{t}=15000+20000(t-1),\ t=1,\ldots,20\right\}$ , for the corresponding $k_{n}$ values obtained, in this case, by the rule $k_{n}=\lceil n^{1/\alpha}\rceil$ , with $\alpha=6$ . We have considered parameter $\delta_{1}=2.4$ in the definition of the eigenvalues of $C$ ; but, in this case, as noted before, operators $\rho$ and $C_{\varepsilon}$ are non-diagonal (see equations 75–76). The estimators of $\rho$ and the associated plug–in predictors are computed, for the three approaches compared, under the assumption that the eigenvectors of C are known.

As expected, in Table 9, an outperformance of the approaches in Bosq [2000], Guillas [2001] is observed in comparison with our methodology. However, for large sample sizes, the ARH(1) prediction methodology proposed here still can be applied with an order of magnitude of $10^{-2}$ for the empirical errors associated with $\widehat{\rho}_{k_{n}}$ given in equation 65. Thus, in the pseudodiagonal autocorrelation operator case, in some sense, our approach could still be considered. As referred in our paper, an example is given in the case where the autocovariance and autocorrelation operators admit a sparse representation in terms of a suitable orthonormal wavelet basis (see, for instance, Angelini et al. [2003], Antoniadis and Sapatinas [2003]).

Acknowledgments

This work has been supported in part by project MTM2015–71839–P (co-funded by Feder funds), of the DGI, MINECO, Spain.

Bibliography57

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aneiros-Pérez and Vieu [2006] \NAT@biblabelnum Aneiros-Pérez and Vieu 2006 Aneiros-Pérez , G. ; Vieu , P.: Semi-functional partial linear regression. Statist. Probab. Lett. 76 (2006), pp. 1102–1110. – DOI: doi.org/10.1016/j.spl.2005.12.007
2Aneiros-Pérez and Vieu [2008] \NAT@biblabelnum Aneiros-Pérez and Vieu 2008 Aneiros-Pérez , G. ; Vieu , P.: Nonparametric time series prediction: a semifunctional partial linear modeling. J. Multivariate Anal. 99 (2008), pp. 834–857. – DOI: doi.org/10.1016/j.jmva.2007.04.010
3Angelini et al. [2003] \NAT@biblabelnum Angelini et al. 2003 Angelini , C. ; Canditiis , D. D. ; Leblanc , F.: Wavelet regression estimation in nonparametric mixed effect models. J. Multivariate Anal. 85 (2003), pp. 267–291. – DOI: doi.org/10.1016/S 0047-259X(02)00055-6
4Antoniadis and Sapatinas [2003] \NAT@biblabelnum Antoniadis and Sapatinas 2003 Antoniadis , A. ; Sapatinas , T.: Wavelet methods for continuous-time prediction using Hilbert-valued autoregressive processes. J. Multivariate Anal. 87 (2003), pp. 133–158. – DOI: doi.org/10.1016/S 0047-259X(03)00028-9
5Bartlett [1946] \NAT@biblabelnum Bartlett 1946 Bartlett , M. S.: On the theoretical specification and sampling properties of autocorrelated time series. Supplement to J. Roy. Stat. Soc. 8 (1946), pp. 27–41. – URL http://www.jstor.org/stable/2983611
6Bensmain and Mourid [2001] \NAT@biblabelnum Bensmain and Mourid 2001 Bensmain , N. ; Mourid , T.: Estimateur ”sieve” de l’opérateur d’un processus ARH(1). C. R. Acad. Sci. Paris Sér. I Math. 332 (2001), pp. 1015–1018. – DOI: doi.org/10.1016/S 0764-4442(01)01954-1
7Besse et al. [2000] \NAT@biblabelnum Besse et al. 2000 Besse , P. C. ; Cardot , H. ; Stephenson , D. B.: Autoregressive forecasting of some functional climatic variations. Scand. J. Statist. 27 (2000), pp. 673–687. – DOI: doi.org/10.1111/1467-9469.00215
8Bongiorno et al. [2014] \NAT@biblabelnum Bongiorno et al. 2014 Bongiorno , G. ; Goia , A. ; Salinelli , E. ; Vieu , P.: Contributions in infinite–dimensional statistics and related topics . In:Soc. Editrice Esculapio, Bologna, 2014. – ISBN 9788874887637

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Asymptotic properties of a componentwise ARH(1) plug-in predictor

Summary

1 Introduction

2 Preliminaries

Remark 1

Remark 2

Lemma 1

Remark 3

Remark 4

3 Estimation and prediction results

Remark 5

Remark 6

Convergence in LS(H)2(Ω,A,P)\mathcal{L}_{\mathcal{S}(H)}^{2}\left(\Omega,\mathcal{A},\mathcal{P}\right)LS(H)2​(Ω,A,P)

Proposition 1

Remark 7

Corollary 1

Consistency of the ARH(1) plug–in predictor.

Proposition 2

4 The Gaussian case

Remark 8

5 Simulation study

Behaviour of ρ^\widehat{\rho}ρ​ and X^n\widehat{X}_{n}Xn​ for large sample sizes

A comparative study

5.2.1 Theoretical–eigenvector–based componentwise estimators

5.2.2 Empirical–eigenvector–based componentwise estimators

5.2.3 Kernel–based nonparametric and penalized estimation

5.2.4 Wavelet–based prediction for ARH(1) processes

6 Final comments

7 Supplementary Material: non–diagonal autocorrelation operator

Acknowledgments

Convergence in $\mathcal{L}_{\mathcal{S}(H)}^{2}\left(\Omega,\mathcal{A},\mathcal{P}\right)$

Behaviour of $\widehat{\rho}$ and $\widehat{X}_{n}$ for large sample sizes