Efficient Estimation by Fully Modified GLS with an Application to the   Environmental Kuznets Curve

Yicong Lin; Hanno Reuvers

arXiv:1908.02552·econ.EM·August 10, 2020

Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve

Yicong Lin, Hanno Reuvers

PDF

TL;DR

This paper introduces a Fully Modified GLS estimator for multivariate cointegrating regressions, improving inference and bias correction, with applications to environmental economics and tests for cointegration.

Contribution

It develops the asymptotic theory for a new Fully Modified GLS estimator that handles complex cointegrating relations with deterministic and stochastic trends.

Findings

01

The FM-GLS estimator shows good performance in simulations.

02

The tests for cointegration are effective and reliable.

03

Application to EKC hypothesis provides new insights.

Abstract

This paper develops the asymptotic theory of a Fully Modified Generalized Least Squares estimator for multivariate cointegrating polynomial regressions. Such regressions allow for deterministic trends, stochastic trends and integer powers of stochastic trends to enter the cointegrating relations. Our fully modified estimator incorporates: (1) the direct estimation of the inverse autocovariance matrix of the multidimensional errors, and (2) second order bias corrections. The resulting estimator has the intuitive interpretation of applying a weighted least squares objective function to filtered data series. Moreover, the required second order bias corrections are convenient byproducts of our approach and lead to standard asymptotic inference. We also study several multivariate KPSS-type of tests for the null of cointegration. A comprehensive simulation study shows good performance of the…

Tables7

Table 1. Table 1 : Empirical MSE for the coefficient β i , 4 subscript 𝛽 𝑖 4 \beta_{i,4} of x i t 2 superscript subscript 𝑥 𝑖 𝑡 2 x_{it}^{2} with i = 1 𝑖 1 i=1 under error Setting A. The column labeled FGLS contains the numerical value of the MSE of feasible FM-GLS. Other MSEs are expressed relative to this benchmark. Values above 1 indicate a better performance of feasible FM-GLS.

$T = 100$
	$n = 3$						$n = 5$
$ρ$	SOLS	SUR	FGLS	infSOLS	infSUR	infGLS	SOLS	SUR	FGLS	infSOLS	infSUR	infGLS
0	0.999	1.048	3.56E-05	0.937	0.937	0.937	0.988	1.047	3.81E-05	0.894	0.894	0.894
0.3	1.170	1.077	5.50E-05	1.098	0.959	0.907	1.186	1.064	5.43E-05	1.090	0.899	0.851
0.6	2.166	1.474	7.09E-05	2.017	1.180	0.864	2.260	1.426	6.85E-05	2.025	1.046	0.785
0.8	5.247	2.607	7.51E-05	6.547	2.474	0.878	5.720	2.589	7.40E-05	6.591	1.984	0.676
$T = 200$
0	1.012	1.042	4.09E-06	0.961	0.961	0.961	1.019	1.069	4.25E-06	0.939	0.939	0.939
0.3	1.146	1.043	6.64E-06	1.101	0.960	0.942	1.216	1.069	6.62E-06	1.124	0.937	0.907
0.6	2.045	1.295	1.01E-05	1.920	1.101	0.906	2.222	1.345	9.37E-06	2.032	1.056	0.851
0.8	5.206	2.434	1.29E-05	5.502	1.971	0.916	6.197	2.701	1.11E-05	6.106	1.778	0.813
$T = 500$
0	1.013	1.028	2.41E-07	0.988	0.988	0.988	1.016	1.039	2.55E-07	0.973	0.973	0.973
0.3	1.195	1.054	4.03E-07	1.162	0.998	0.975	1.224	1.052	4.02E-07	1.176	0.978	0.961
0.6	1.877	1.154	7.45E-07	1.789	1.054	0.952	2.142	1.217	6.49E-07	1.993	1.023	0.914
0.8	4.158	1.771	1.26E-06	4.011	1.456	0.935	5.341	2.031	1.03E-06	5.038	1.394	0.880

Table 2. Table 2 : Empirical MSE for the coefficient β i , 4 subscript 𝛽 𝑖 4 \beta_{i,4} of x i t 2 superscript subscript 𝑥 𝑖 𝑡 2 x_{it}^{2} with i = 1 𝑖 1 i=1 . under error Setting B. The column labeled FGLS contains the numerical value of the MSE of feasible FM-GLS. Other MSEs are expressed relative to this benchmark. Values above 1 indicate a better performance of feasible FM-GLS.

Panel A: Low endogeneity $θ = 0.3$
		$n = 3$						$n = 5$
$(\underline{λ}, \bar{λ})$	$T$	SOLS	SUR	FGLS	infSOLS	infSUR	infGLS	SOLS	SUR	FGLS	infSOLS	infSUR	infGLS
$(0.1, 0.5)$	100	1.290	1.270	9.46E-05	1.270	1.131	0.865	1.256	1.275	9.51E-05	1.215	1.052	0.822
	200	1.216	1.163	1.27E-05	1.191	1.077	0.916	1.220	1.164	1.22E-05	1.201	1.038	0.896
	500	1.137	1.088	8.58E-07	1.133	1.033	0.961	1.159	1.106	7.65E-07	1.151	0.997	0.936
$(0.5, 0.8)$	100	1.790	1.825	3.99E-05	1.792	1.522	0.638	1.812	1.896	3.81E-05	1.802	1.431	0.594
	200	1.677	1.681	4.91E-06	1.597	1.408	0.763	1.682	1.645	4.73E-06	1.581	1.271	0.733
	500	1.467	1.404	3.34E-07	1.392	1.248	0.897	1.522	1.415	3.01E-07	1.432	1.185	0.864
$(0.8, 0.95)$	100	0.123	0.136	1.48E-04	0.212	0.145	0.022	0.001	0.001	2.67E-02	0.002	0.001	0.000
	200	2.257	2.322	8.59E-07	2.537	1.986	0.517	1.622	1.577	1.04E-06	2.218	1.343	0.366
	500	2.098	2.050	5.47E-08	2.021	1.686	0.703	2.031	2.014	4.90E-08	1.964	1.394	0.632
Panel B: High endogeneity $θ = 0.5$
$(0.1, 0.5)$	100	1.327	1.269	7.48E-05	1.579	1.230	0.899	1.335	1.299	7.26E-05	1.825	1.211	0.817
	200	1.232	1.161	9.91E-06	1.394	1.131	0.949	1.269	1.225	9.26E-06	1.653	1.166	0.886
	500	1.179	1.091	6.66E-07	1.297	1.079	0.980	1.162	1.093	5.99E-07	1.464	1.071	0.942
$(0.5, 0.8)$	100	1.880	1.892	3.08E-05	2.732	1.861	0.661	1.850	1.833	3.05E-05	3.170	1.718	0.575
	200	1.793	1.727	3.72E-06	2.132	1.610	0.798	1.713	1.639	3.76E-06	2.143	1.381	0.697
	500	1.480	1.370	2.62E-07	1.678	1.299	0.941	1.577	1.461	2.44E-07	1.739	1.256	0.863
$(0.8, 0.95)$	100	0.030	0.031	5.14E-04	0.172	0.077	0.005	0.000	0.000	3.27E+03	0.000	0.000	0.000
	200	2.381	2.514	6.85E-07	5.002	2.825	0.541	1.880	1.771	7.83E-07	5.026	2.030	0.389
	500	2.186	2.090	4.23E-08	2.756	1.850	0.712	2.184	2.044	4.03E-08	2.786	1.560	0.619

Table 3. Table 3 : Empirical size ( % percent \% ) of the single-equation Wald tests for H 0 : β 1 , 4 = − 0.3 : subscript 𝐻 0 subscript 𝛽 1 4 0.3 H_{0}:\beta_{1,4}=-0.3 and the joint Wald tests for H 0 : β 1 , 4 = β 2 , 4 = ⋯ = β n , 4 = − 0.3 : subscript 𝐻 0 subscript 𝛽 1 4 subscript 𝛽 2 4 ⋯ subscript 𝛽 𝑛 4 0.3 H_{0}:\beta_{1,4}=\beta_{2,4}=\cdots=\beta_{n,4}=-0.3 , where β i , 4 subscript 𝛽 𝑖 4 \beta_{i,4} denote the coefficients in front of x i t 2 superscript subscript 𝑥 𝑖 𝑡 2 x_{it}^{2} . The columns labeled by Wald-SOLS and Wald-SUR display the results of the Wald-type test statistics given in Proposition 2 by Wagner et al. ( 2020 ) . The Wald-FGLS test is constructed as in Theorem 4 .

Panel A: Single-equation test
	$n = 3$			$n = 5$
$ρ$	Wald-SOLS	Wald-SUR	Wald-FGLS	Wald-SOLS	Wald-SUR	Wald-FGLS
$T = 100$
0	11.75	13.44	9.89	14.30	17.84	13.44
0.3	13.25	15.09	9.92	15.32	19.59	13.55
0.6	16.13	19.03	7.54	18.80	27.65	11.86
0.8	20.56	26.60	4.70	26.72	43.11	10.13
$T = 200$
0	9.02	10.00	7.48	9.90	12.00	8.47
0.3	9.96	10.93	7.24	11.32	13.62	8.81
0.6	12.68	14.10	5.93	14.62	19.40	7.71
0.8	15.72	19.50	2.95	19.58	30.95	4.90
$T = 500$
0	7.02	7.44	5.89	7.32	8.47	6.19
0.3	8.07	8.41	5.76	8.54	9.50	6.04
0.6	9.41	9.68	4.78	10.34	12.54	5.94
0.8	10.92	12.48	3.09	13.47	18.96	3.73
Panel B: Joint test
$T = 100$
0	17.85	21.68	14.60	29.13	39.40	26.65
0.3	20.67	24.62	14.44	32.60	44.91	28.14
0.6	27.13	33.59	11.40	45.65	65.44	27.72
0.8	36.99	48.49	8.30	63.29	86.38	26.57
$T = 200$
0	11.66	13.74	9.00	17.49	22.90	13.26
0.3	14.11	16.25	8.77	21.35	28.23	14.55
0.6	19.08	22.71	7.10	30.58	44.07	12.88
0.8	26.21	33.92	4.19	44.82	69.44	10.60
$T = 500$
0	8.70	9.64	6.60	10.83	13.38	7.91
0.3	9.84	10.74	6.23	13.29	16.19	8.19
0.6	12.79	14.09	5.13	18.69	25.15	7.67
0.8	16.02	19.56	3.35	26.86	41.72	5.28

Table 4. Table 4 : Empirical size ( % percent \% ) and power ( % percent \% ) of Bonferroni-type (multivariate) subsampling KPSS tests. The integer J 1 subscript 𝐽 1 J_{1} in Power DGP1 indicates the number of unit roots contained in errors { 𝒖 t } subscript 𝒖 𝑡 \{\bm{u}_{t}\} , J 2 subscript 𝐽 2 J_{2} is related to the number of equations that exclude cubic power terms x i t 3 superscript subscript 𝑥 𝑖 𝑡 3 x_{it}^{3} , J 3 subscript 𝐽 3 J_{3} specifies the number of spurious relations.

			$n = 3$			$n = 5$
		$T$	$K^{S O L S}$	$K^{S U R}$	$K^{B I A M}$	$K^{S O L S}$	$K^{S U R}$	$K^{B I A M}$
Panel A: Size
$(\underline{λ}, \bar{λ})$ , serial correlation	$(0.1, 0.5)$	100	0.37	0.34	0.47	0.33	0.25	0.26
		200	0.42	0.37	0.53	0.42	0.28	0.18
		500	0.71	0.60	1.24	0.64	0.58	0.80
	$(0.5, 0.8)$	100	8.10	7.28	1.86	22.35	19.65	1.29
		200	3.54	3.22	2.91	8.70	7.57	1.09
		500	1.92	1.69	4.54	4.02	3.40	4.78
	$(0.8, 0.95)$	100	50.63	48.67	16.17	90.99	89.69	18.41
		200	40.48	38.56	14.26	83.71	81.78	12.16
		500	21.87	20.36	16.25	60.09	56.87	15.49
Panel B: Power DGP1
$J_{1}$ , $# {unit roots}$	1	100	44.03	39.92	21.14	57.65	52.05	14.00
		200	57.97	50.90	33.85	74.09	66.64	27.24
		500	67.23	57.37	56.63	88.41	80.23	49.93
	2	100	66.64	63.83	38.07	84.77	80.36	29.12
		200	77.77	73.00	52.88	94.38	91.16	49.87
		500	84.19	78.83	78.40	98.57	96.89	72.02
	$n$	100	78.89	77.41	54.26	99.39	99.24	64.89
		200	88.40	86.53	70.07	99.97	99.95	88.75
		500	91.28	90.19	89.55	100.00	100.00	96.53
Panel C: Power DGP2
$J_{2}$ , $# {misspecified equations}$	1	100	10.92	10.51	3.47	18.67	18.40	1.91
		200	26.41	25.58	7.72	41.90	41.41	4.95
		500	55.72	55.28	24.10	77.76	77.22	18.36
	2	100	17.83	16.97	6.15	30.77	29.77	3.70
		200	40.29	39.04	15.57	62.01	61.15	10.51
		500	69.64	68.50	38.48	91.66	91.27	30.87
	$n$	100	23.16	22.53	9.88	53.66	51.21	10.45
		200	47.25	45.60	22.83	83.78	82.73	29.05
		500	75.60	73.98	56.79	98.23	98.15	69.35
Panel D: Power DGP3
$J_{3}$ , $# {spurious relations}$	1	100	77.03	77.06	28.34	94.26	94.12	21.04
		200	89.55	89.14	35.64	98.97	98.84	31.79
		500	97.26	97.05	55.61	99.91	99.91	48.48
	2	100	87.40	86.75	52.85	98.95	98.81	43.05
		200	94.95	94.33	57.95	99.89	99.89	58.15
		500	98.69	98.49	77.59	100.00	100.00	69.59
	$n$	100	91.55	90.60	71.95	99.97	99.92	87.87
		200	96.71	96.33	74.85	99.99	100.00	93.86
		500	98.04	97.62	91.66	100.00	100.00	98.02

Table 5. Table 5 : Outcomes for the joint tests of cointegration in the empirical study. At a significance level of 5%, the null hypothesis of cointegration is rejected when ‘’rejection rule’ is less than 5%.

	$K^{S O L S}$	$K^{S U R}$	$K^{B I A M}$
Statistic	16.54	12.66	8.19
Rejection Rule (in %)	0.00	0.03	2.73
Block Size	22	20	22
$# {blocks}$	6	7	6

Table 6. Table 6 : Estimates for β i , 3 subscript 𝛽 𝑖 3 \beta_{i,3} and β i , 4 subscript 𝛽 𝑖 4 \beta_{i,4} from the quadratic EKC model in ( 5.1 ). The numbers between parentheses are 95% asymptotic confidence intervals. Turning points are computed as exp ⁡ ( − β ^ i , 3 / 2 β ^ i , 4 ) subscript ^ 𝛽 𝑖 3 2 subscript ^ 𝛽 𝑖 4 \exp\left(-\widehat{\beta}_{i,3}/2\,\widehat{\beta}_{i,4}\right) .

		$x_{t}$	$x_{t}^{2}$	Turning point
	SOLS	9.173	-0.411	68,900
		(6.797,11.548)	(-0.535,-0.288)
Austria	SUR	8.494	-0.370	76,211
Austria		(6.764,10.223)	(-0.464,-0.276)
	FGLS	6.553	-0.276	708,712
		(2.138,10.967)	(-0.513,-0.040)
	SOLS	12.927	-0.645	22,420
		(11.795,15.059)	(-0.702,-0.589)
Belgium	SUR	9.973	-0.503	20,195
Belgium		(9.158,10.787)	(-0.545,-0.461)
	FGLS	8.762	-0.443	19,795
		(7.236,10,287)	(-0.521,-0.365)
	SOLS	15.676	-0.716	56,967
		(14.162,17.289)	(-0.788,-0.643)
Finland	SUR	15.136	-0.684	63,400
Finland		(14.030,16.242)	(-0.742,-0.627)
	FGLS	14.392	-0.646	68,775
		(12.075,16.708)	(-0.769,-0.523)
	SOLS	11.382	-0.540	38,120
		(10.140,12.624)	(-0.602,-0.477)
Netherlands	SUR	10.063	-0.475	39,637
Netherlands		(9.183,10.944)	(-0.522,-0.429)
	FGLS	9.102	-0.430	39,908
		(7.606,10.597)	(-0.506,-0.353)
	SOLS	7.070	-0.232	$4.287 \times 10^{6}$
		(5.516,8.624)	(-0.310,-0.153)
Switzerland	SUR	6.962	-0.232	$3.316 \times 10^{6}$
Switzerland		(5.481,8.443)	(-0.308,-0.156)
	FGLS	7.052	-0.254	$1.074 \times 10^{6}$
		(5.173,8.932)	(-0.350,-0.158)
	SOLS	8.450	-0.429	20,242
		(6.976,10.020)	(-0.502,-0.355)
United Kingdom	SUR	9.523	-0.475	22,596
United Kingdom		(8.486,10.560)	(-0.527,-0.423)
	FGLS	9.056	-0.453	21,887
		(7.244,10.868)	(-0.542,-0.364)

Table 7. Table 7 : The numerical values for the BIC criterion.

	$p = 1$	$p = 2$	$p = 3$	$p = 4$	$p = 5$	$p = 6$	$p = 7$	$p = 8$
${{\hat{𝒖}}_{t, F G L S}}$	-23.88	-23.49	-22.75	-21.87	-21.25	-20.66	-20.11	-19.27
${Δ 𝒙_{t}}$	-37.94	-37.78	-37.15	-36.44	-35.68	-35.03	-34.34	-33.64

Equations256

y_{t} = Z_{t}^{'} β + u_{t}, for t = 1, 2, \dots, T,

y_{t} = Z_{t}^{'} β + u_{t}, for t = 1, 2, \dots, T,

A (L) u_{t} = (I_{n} - j = 1 \sum \infty A_{j} L^{j}) u_{t} = η_{t},

A (L) u_{t} = (I_{n} - j = 1 \sum \infty A_{j} L^{j}) u_{t} = η_{t},

A (ℓ)

A (ℓ)

S (ℓ)

Σ_{u}^{- 1} = M_{u}^{'} S_{u}^{- 1} M_{u},

Σ_{u}^{- 1} = M_{u}^{'} S_{u}^{- 1} M_{u},

M_{u}

M_{u}

Σ_{u}^{- 1} (q) = M_{u}^{'} (q) S_{u}^{- 1} (q) M_{u} (q),

Σ_{u}^{- 1} (q) = M_{u}^{'} (q) S_{u}^{- 1} (q) M_{u} (q),

m_{u}^{ij} (q) = ⎩ ⎨ ⎧ O_{n \times n}, I_{n}, - A_{i - j} (i - 1), - A_{i - j} (q), if i < j or {q + 1 < i \leq T, 1 \leq j \leq i - q - 1} if i = j if 2 \leq i \leq q, 1 \leq j \leq i - 1 if q + 1 \leq i \leq T, i - q \leq j \leq i - 1.

m_{u}^{ij} (q) = ⎩ ⎨ ⎧ O_{n \times n}, I_{n}, - A_{i - j} (i - 1), - A_{i - j} (q), if i < j or {q + 1 < i \leq T, 1 \leq j \leq i - q - 1} if i = j if 2 \leq i \leq q, 1 \leq j \leq i - 1 if q + 1 \leq i \leq T, i - q \leq j \leq i - 1.

M_{u} = [I_{n} - A_{1} (1) - A_{2} (2) - A_{3} O I_{n} - A_{1} (2) - A_{2} O O I_{n} - A_{1} O O O I_{n}], S_{u} = [S (0) S (1) S (2) Σ_{η η}] .

M_{u} = [I_{n} - A_{1} (1) - A_{2} (2) - A_{3} O I_{n} - A_{1} (2) - A_{2} O O I_{n} - A_{1} O O O I_{n}], S_{u} = [S (0) S (1) S (2) Σ_{η η}] .

M_{u} (2) = [I_{n} - A_{1} (1) - A_{2} (2) O O I_{n} - A_{1} (2) - A_{2} (2) O O I_{n} - A_{1} (2) O O O I_{n}], S_{u} (2) = [S (0) S (1) S (2) S (2)] .

M_{u} (2) = [I_{n} - A_{1} (1) - A_{2} (2) O O I_{n} - A_{1} (2) - A_{2} (2) O O I_{n} - A_{1} (2) O O O I_{n}], S_{u} (2) = [S (0) S (1) S (2) S (2)] .

β_{G L S} := (Z^{'} Σ_{u}^{- 1} (q) Z)^{- 1} Z^{'} Σ_{u}^{- 1} (q) y .

β_{G L S} := (Z^{'} Σ_{u}^{- 1} (q) Z)^{- 1} Z^{'} Σ_{u}^{- 1} (q) y .

S_{u} = diag (\frac{σ ^{2}}{1 - ρ ^{2}}, σ^{2}, \dots, σ^{2}), M_{u} y = 1 - ρ ⋮ 0 1 ⋱ \dots ⋱ - ρ 1 y_{1} y_{2} ⋮ y_{T} = y_{1} y_{2} - ρ y_{1} ⋮ y_{T} - ρ y_{T - 1},

S_{u} = diag (\frac{σ ^{2}}{1 - ρ ^{2}}, σ^{2}, \dots, σ^{2}), M_{u} y = 1 - ρ ⋮ 0 1 ⋱ \dots ⋱ - ρ 1 y_{1} y_{2} ⋮ y_{T} = y_{1} y_{2} - ρ y_{1} ⋮ y_{T} - ρ y_{T - 1},

\frac{1}{T ^{1/2}} t = 1 \sum [r T] ξ_{t} \Rightarrow B_{ξ} (r) \equiv [B_{u} (r) B_{v} (r)] \equiv [A (1) O O D (1)]^{- 1} [B_{η} (r) B_{ϵ} (r)],

\frac{1}{T ^{1/2}} t = 1 \sum [r T] ξ_{t} \Rightarrow B_{ξ} (r) \equiv [B_{u} (r) B_{v} (r)] \equiv [A (1) O O D (1)]^{- 1} [B_{η} (r) B_{ϵ} (r)],

Ω = [Ω_{uu} Ω_{v u} Ω_{uv} Ω_{v v}] = [A (1) O O D (1)]^{- 1} [Σ_{η η} Σ_{ϵη} Σ_{η ϵ} Σ_{ϵϵ}] [A (1)^{'} O O D (1)^{'}]^{- 1} .

Ω = [Ω_{uu} Ω_{v u} Ω_{uv} Ω_{v v}] = [A (1) O O D (1)]^{- 1} [Σ_{η η} Σ_{ϵη} Σ_{η ϵ} Σ_{ϵϵ}] [A (1)^{'} O O D (1)^{'}]^{- 1} .

G_{T}^{- 1} (β_{G L S} - β)

G_{T}^{- 1} (β_{G L S} - β)

\times (\int_{0}^{1} J (r) Ω_{uu}^{- 1} d B_{u . v} (r) + \int_{0}^{1} J (r) Ω_{uu}^{- 1} Ω_{uv} Ω_{v v}^{- 1} d B_{v} (r) + B),

A (ℓ)

A (ℓ)

S (ℓ)

Σ_{u}^{- 1} (q) = M_{u}^{'} (q) S_{u}^{- 1} (q) M_{u} (q) .

Σ_{u}^{- 1} (q) = M_{u}^{'} (q) S_{u}^{- 1} (q) M_{u} (q) .

Σ_{u}^{- 1} (q_{T}) - Σ_{u}^{- 1}

Σ_{u}^{- 1} (q_{T}) - Σ_{u}^{- 1}

= O_{p} (q_{T}^{3} / T) + O (\frac{1}{q _{T}} s = q_{T} + 1 \sum \infty s ∥ A_{s} ∥_{F}) ⟶_{p} 0 as T \to \infty .

Δ_{q_{T}, r_{T}} = Q_{r_{T}}^{'} Σ_{ξ} (q_{T}) Q_{1}

Δ_{q_{T}, r_{T}} = Q_{r_{T}}^{'} Σ_{ξ} (q_{T}) Q_{1}

β_{S O L S}^{+} = (Z^{'} Z)^{- 1} (Z^{'} y^{+} - A),

β_{S O L S}^{+} = (Z^{'} Z)^{- 1} (Z^{'} y^{+} - A),

G_{T}^{- 1} (β_{S O L S}^{+} - β) \Rightarrow (\int_{0}^{1} J (r) J (r)^{'} d r)^{- 1} \int_{0}^{1} J (r) d B_{u . v} (r) .

G_{T}^{- 1} (β_{S O L S}^{+} - β) \Rightarrow (\int_{0}^{1} J (r) J (r)^{'} d r)^{- 1} \int_{0}^{1} J (r) d B_{u . v} (r) .

β_{S U R}^{+} = (Z^{'} (I_{T} \otimes Ω_{u . v}^{- 1}) Z)^{- 1} (Z^{'} (I_{T} \otimes Ω_{u . v}^{- 1}) y^{+} - A^{*}),

β_{S U R}^{+} = (Z^{'} (I_{T} \otimes Ω_{u . v}^{- 1}) Z)^{- 1} (Z^{'} (I_{T} \otimes Ω_{u . v}^{- 1}) y^{+} - A^{*}),

G_{T}^{- 1} (β_{S U R}^{+} - β) \Rightarrow (\int_{0}^{1} J (r) Ω_{u . v}^{- 1} J (r)^{'} d r)^{- 1} \int_{0}^{1} J (r) Ω_{u . v}^{- 1} d B_{u . v} (r) .

G_{T}^{- 1} (β_{S U R}^{+} - β) \Rightarrow (\int_{0}^{1} J (r) Ω_{u . v}^{- 1} J (r)^{'} d r)^{- 1} \int_{0}^{1} J (r) Ω_{u . v}^{- 1} d B_{u . v} (r) .

\widehat{\bm{\beta}}_{FGLS}^{+}=\Big{(}\bm{Z}^{\prime}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q)\bm{Z}\Big{)}^{-1}\left[\bm{Z}^{\prime}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q)\bm{y}-\bm{Z}^{\prime}\Big{(}\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1}\widehat{\bm{\varOmega}}_{uv}\widehat{\bm{\varOmega}}_{vv}^{-1}\Big{)}\bm{v}-\widehat{{\bm{\mathcal{B}}}}^{+}\right],

\widehat{\bm{\beta}}_{FGLS}^{+}=\Big{(}\bm{Z}^{\prime}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q)\bm{Z}\Big{)}^{-1}\left[\bm{Z}^{\prime}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q)\bm{y}-\bm{Z}^{\prime}\Big{(}\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1}\widehat{\bm{\varOmega}}_{uv}\widehat{\bm{\varOmega}}_{vv}^{-1}\Big{)}\bm{v}-\widehat{{\bm{\mathcal{B}}}}^{+}\right],

\widehat{{\bm{\mathcal{B}}}}_{i}^{+}=\left[\operatorname{row}_{i}\Big{(}\widehat{\bm{\varSigma}}_{\epsilon\eta}\Big{)}\operatorname{col}_{i}\Big{(}\widehat{\bm{\varSigma}}_{\eta\eta}^{-1}\Big{)}-\operatorname{row}_{i}\Big{(}\widehat{\bm{\varDelta}}_{vv}\Big{)}\operatorname{col}_{i}\Big{(}\widehat{\bm{\varOmega}}_{vv}^{-1}\widehat{\bm{\varOmega}}_{vu}\widehat{\bm{\varOmega}}_{uu}^{-1}\Big{)}\right]\leavevmode\nobreak\ \widehat{\bm{b}}_{i}.

\widehat{{\bm{\mathcal{B}}}}_{i}^{+}=\left[\operatorname{row}_{i}\Big{(}\widehat{\bm{\varSigma}}_{\epsilon\eta}\Big{)}\operatorname{col}_{i}\Big{(}\widehat{\bm{\varSigma}}_{\eta\eta}^{-1}\Big{)}-\operatorname{row}_{i}\Big{(}\widehat{\bm{\varDelta}}_{vv}\Big{)}\operatorname{col}_{i}\Big{(}\widehat{\bm{\varOmega}}_{vv}^{-1}\widehat{\bm{\varOmega}}_{vu}\widehat{\bm{\varOmega}}_{uu}^{-1}\Big{)}\right]\leavevmode\nobreak\ \widehat{\bm{b}}_{i}.

G_{T}^{- 1} (β_{F G L S}^{+} - β) \Rightarrow (\int_{0}^{1} J (r) Ω_{uu}^{- 1} J (r)^{'} d r)^{- 1} \int_{0}^{1} J (r) Ω_{uu}^{- 1} d B_{u . v} (r) .

G_{T}^{- 1} (β_{F G L S}^{+} - β) \Rightarrow (\int_{0}^{1} J (r) Ω_{uu}^{- 1} J (r)^{'} d r)^{- 1} \int_{0}^{1} J (r) Ω_{uu}^{- 1} d B_{u . v} (r) .

\mathcal{W}=\Big{(}\bm{R}\widehat{\bm{\beta}}_{FGLS}^{+}-\bm{r}\Big{)}^{\prime}\widehat{\bm{\varPhi}}^{-1}\Big{(}\bm{R}\widehat{\bm{\beta}}_{FGLS}^{+}-\bm{r}\Big{)}\Rightarrow\chi_{k}^{2},

\mathcal{W}=\Big{(}\bm{R}\widehat{\bm{\beta}}_{FGLS}^{+}-\bm{r}\Big{)}^{\prime}\widehat{\bm{\varPhi}}^{-1}\Big{(}\bm{R}\widehat{\bm{\beta}}_{FGLS}^{+}-\bm{r}\Big{)}\Rightarrow\chi_{k}^{2},

φ_{j, b} ({x}) = [x_{j}^{'}, s = j \sum j + 1 x_{s}^{'}, \dots, s = j \sum j + b - 1 x_{s}^{'}]^{'},

φ_{j, b} ({x}) = [x_{j}^{'}, s = j \sum j + 1 x_{s}^{'}, \dots, s = j \sum j + b - 1 x_{s}^{'}]^{'},

K_{j, b_{T}}^{i} = \frac{1}{b _{T}^{2}} φ_{j, b_{T}} ({\hat{u}_{i}^{+}})^{'} (I_{b_{T}} \otimes Ω_{u . v}^{- 1}) φ_{j, b_{T}} ({\hat{u}_{i}^{+}}), for i \in {S O L S, S U R},

K_{j, b_{T}}^{i} = \frac{1}{b _{T}^{2}} φ_{j, b_{T}} ({\hat{u}_{i}^{+}})^{'} (I_{b_{T}} \otimes Ω_{u . v}^{- 1}) φ_{j, b_{T}} ({\hat{u}_{i}^{+}}), for i \in {S O L S, S U R},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11affiliationtext: Department of Econometrics and Data Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands22affiliationtext: Department of Econometrics, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands

Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve

Yicong Lin

Hanno Reuvers Corresponding author: Department of Econometrics, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands. E-mail address: [email protected].

Abstract

This paper develops the asymptotic theory of a Fully Modified Generalized Least Squares estimator for multivariate cointegrating polynomial regressions. Such regressions allow for deterministic trends, stochastic trends and integer powers of stochastic trends to enter the cointegrating relations. Our fully modified estimator incorporates: (1) the direct estimation of the inverse autocovariance matrix of the multidimensional errors, and (2) second order bias corrections. The resulting estimator has the intuitive interpretation of applying a weighted least squares objective function to filtered data series. Moreover, the required second order bias corrections are convenient byproducts of our approach and lead to standard asymptotic inference. We also study several multivariate KPSS-type of tests for the null of cointegration. A comprehensive simulation study shows good performance of the FM-GLS estimator and the related tests. As a practical illustration, we reinvestigate the Environmental Kuznets Curve (EKC) hypothesis for six early industrialized countries as in Wagner et al. (2020).

JEL Classification: C12, C13, C32, Q20

Keywords: Cointegrating Polynomial Regression, Cointegration Testing, Environmental Kuznets Curve, Fully Modified Estimation, Generalized Least Squares

1 Introduction

In recent years, there has been an increasing interest in the theoretical properties and theoretical justifications of nonlinear cointegrating relations. For theoretical properties we refer to the textbook treatise by Wang (2015) , the recent review article by Tjøstheim (2020), and the extensive references found in either of them. Theoretical justifications are in some cases refinements of existing economic theory, e.g. nonlinear cointegration among bond yields with different times to maturity due to yield-dependent risk premia as discussed in Breitung (2001), or nonlinear purchasing power parity due to transaction/transportation costs and trade barriers (e.g. Hong and Phillips (2010)). In other cases, economic theory postulates a nonlinear cointegrating relation from the outset. A popular example of the latter is the Environmental Kuznets curve described in Grossman and Krueger (1995).111There is no direct reference to Kuznets in the original paper by Grossman and Krueger (1995). But their nonlinear relations between environmental indicators and per capita GDP do remind strongly of the inverted U-shaped between income inequality and economic growth proposed by Kuznets (1901-1985). The term ‘environmental Kuznets curve’ was used later.

There are three branches of literature on the estimation of such nonlinear cointegrating relations. First, the papers by Park and Phillips (1999) and Park and Phillips (2001) are concerned with nonlinear cointegration analysis of a parametric form. Second, there is a literature on nonparametric kernel estimation of nonlinear cointegrating equations, see for example Wang and Phillips (2009) or Li et al. (2020). The third approach is reminiscent of a nonparametric sieve estimation with power polynomial basis. That is, one estimates a cointegrating relation containing integer powers of integrated regressors. Wagner and Hong (2016) named this a cointegrating polynomial regression (CPR). The multivariate seemingly unrelated regressions extension is available in Wagner et al. (2020). Our model specification builds on this Seemingly Unrelated Cointegrating Polynomial Regression (SUCPR) setup.

We make two theoretical contributions to the literature on cointegrating polynomial regressions. First, we propose the Fully Modified Generalized Least Squares (FM-GLS) estimator. This estimator requires two main steps: (1) It employs the inverse covariance matrix of the $2nT$ -dimensional innovation vector, that is, the covariance matrix of the vector which stacks the $n$ disturbances in the cointegrating equations and the $n$ disturbances driving the $I(1)$ regressors over the time span $T$ . The estimation of this inverse covariance matrix is based on the Modified Cholesky Decomposition (MCD) originating from Pourahmadi (1999). The approach is computationally simple because the required quantities are obtained from the coefficients and prediction error variances of best linear least squares predictors. In our setting this translates into estimating multiple VAR models up to some maximum lag order $q$ . Sufficient conditions for consistency are provided. (2) We exploit the previous results to correct the second-order biases, resulting in improved efficiency and standard chi-square inference. Also note that the approach differs from the linear cointegration results in Mark et al. (2005) and Moon and Perron (2005) since our bias corrections do not rely on leads and lags augmentation. Second, a multi-equation cointegration specification asks for a multivariate cointegration test. Building upon the work by Choi and Saikkonen (2010), we propose three such tests. The first test uses pre-filtered residuals to account for serial correlation, whereas the other two are direct multivariate generalizations of the KPSS-type of test in Wagner and Hong (2016). The estimator and cointegration tests are subsequently studied by Monte Carlo simulation. In our simulations, the FM-GLS estimator has a higher estimation accuracy and its implied Wald test has better size control and higher size-adjusted power. We find by simulation that prefiltering improves the size control of the cointegration tests but has an adverse effect on power. In the empirical application there is a surprisingly large spread in the widths of the confidence intervals. It turns out that FM-SUR, and to a lesser degree FM-SOLS, underestimates the parameter uncertainty compared to FM-GLS.

The plan of this paper is as follows. Section 2 introduces the model and the modified Cholesky block decomposition. This decomposition is the main ingredient for the fully modified GLS estimator. The related asymptotic theory and stationarity tests are discussed in Section 3 whereas a finite sample simulation study is presented in Section 4. The empirical application can be found in Section 5 where we look at the environmental Kuznets curve. Section 6 concludes. All proofs are collected in the Appendices.222The Appendices contains the proofs of all the results that are related to the generalized least squares estimator. Supplementary material is available on the websites of the authors.

Some words on notation. $C$ denotes a generic positive constant. The integer part of the number $a\in\mathbb{R}^{+}$ is denoted by $[a]$ . For a vector $\bm{x}\in\mathbb{R}^{n}$ , its dimension is abbreviated by $\dim(\bm{x})$ and its $p$ -norm by $\|\bm{x}\|_{p}=(\sum_{i=1}^{n}|x_{i}|^{p})^{1/p}$ . When applied to a matrix, $\|\bm{A}\|_{p}$ signifies the induced norm defined by $\|\bm{A}\|_{p}=\sup_{\bm{x}\neq\bm{0}}\|\bm{A}\bm{x}\|_{p}/\|\bm{x}\|_{p}$ . The subscripts are omitted whenever $p=2$ , e.g. $\|\bm{x}\|=\left(\sum_{i=1}^{n}|x_{i}|^{2}\right)^{1/2}$ and $\|\bm{A}\|=\left(\lambda_{max}\left(\bm{A}^{\prime}\bm{A}\right)\right)^{1/2}$ where $\lambda_{max}(\cdot)$ is the largest eigenvalue. Similarly, $\lambda_{min}(\cdot)$ denotes the smallest eigenvalue. The Frobenius norm is denoted as $\|\cdot\|_{\mathcal{F}}$ . The $(n\times n)$ identity matrix is written as $\bm{I}_{n}$ . The $i^{th}$ row or $i^{th}$ column of an arbitrary matrix $\bm{A}$ are selected using $\operatorname{col}_{i}(\bm{A})$ and $\operatorname{row}_{i}(\bm{A})$ , respectively. The Kronecker product is denoted “ $\otimes$ ”. We use the symbol “ $\Rightarrow$ ” to signify weak convergence and the symbol “ $\stackrel{{\scriptstyle d}}{{=}}$ ” for equality in distribution. The stochastic order and strict stochastic order relations are indicated by $O_{p}(\cdot)$ and $o_{p}(\cdot)$ .

2 The Model

As in Wagner et al. (2020), we study a system of seemingly unrelated cointegrating polynomial regressions (SUCPR), that is

[TABLE]

where the dependent variable $\bm{y}_{t}:=[y_{1t},y_{2t},\ldots,y_{nt}]^{\prime}$ and innovations $\bm{u}_{t}:=[u_{1t},u_{2t},\ldots,u_{nt}]^{\prime}$ are $(n\times 1)$ random vectors. For the cross-sectional unit $i$ , we use as explanatory variables: (1) deterministic components such as an intercept and polynomial time trends up to order $d_{i}$ , and (2) integer powers of the $I(1)$ regressors $x_{it}$ up to degree $s_{i}$ . Defining $\bm{d}_{it}=[1,t,\ldots,t^{d_{i}}]^{\prime}$ , $\bm{s}_{it}=[x_{it},\ldots,x_{it}^{s_{i}}]$ , and $\bm{z}_{it}=[\bm{d}_{it}^{\prime},\bm{s}_{it}^{\prime}]^{\prime}$ , we subsequently collect all explanatory variables in the block diagonal matrix $\bm{Z}_{t}=\operatorname{diag}[\bm{z}_{1t},\ldots,\bm{z}_{nt}]$ . We are interested in the $d$ -dimensional parameter vector $\bm{\beta}$ where $d=\sum_{i=1}^{n}(d_{i}+s_{i}+1)$ . Overall, each cross-sectional unit in (2.1) specifies a single cointegrating relation containing polynomials in deterministic and stochastic trends. For each $i$ , the highest orders of these polynomials, i.e. $d_{i}$ and $s_{i}$ , are assumed to be fixed and known. We do not allow for cointegration in the cross-sectional dimension.

The innovation series $\{\bm{u}_{t}\}$ is allowed to exhibit dependencies over time and across series. We assume that these dependencies can be modeled by a stationary VAR( $\infty$ ) process, that is

[TABLE]

(see Assumption 1 for further details). Efficient estimation of the parameter vector $\bm{\beta}$ now requires the use of generalized least squares (GLS). Our Zellner (1962)-type GLS estimator relies on the inverse of the $(nT\times nT)$ matrix $\bm{\varSigma}_{\bm{u}}=\mathbb{E}(\bm{u}\bm{u}^{\prime})$ where $\bm{u}=[\bm{u}_{1}^{\prime},\bm{u}_{2}^{\prime},\ldots,\bm{u}_{T}^{\prime}]^{\prime}$ . In this paper, we directly estimate $\bm{\varSigma}_{\bm{u}}^{-1}$ using a multivariate extension of the modified Cholesky decomposition by Pourahmadi (1999). This extension was named the Modified Cholesky Block Decomposition (MCBD) by Kim and Zimmerman (2012) and Kohli et al. (2016). The latter papers used the MCBD to parametrize the covariance matrix of multivariate longitudinal data. As in Beutner et al. (2019), we use the MCBD for the time series application mentioned above, i.e. the computation of $\bm{\varSigma}_{\bm{u}}^{-1}$ . The decomposition is closely related to linear minimum MSE predictors.

We define

[TABLE]

and $\bm{S}(0)=\mathbb{E}(\bm{u}_{t}\bm{u}_{t}^{\prime})$ . The inverse of the covariance matrix $\bm{\varSigma}_{\bm{u}}$ is then given by

[TABLE]

where $\bm{\mathcal{S}}_{\bm{u}}=\operatorname{diag}\Big{(}\bm{S}(0),\bm{S}(1),\ldots,\bm{S}(T-1)\Big{)}$ ,

[TABLE]

and the $\bm{A}_{j}(i)$ follow from the partitioning $\bm{A}(\ell)=\big{[}\bm{A}_{1}(\ell),\ldots,\bm{A}_{\ell}(\ell)\big{]}$ .

Weak stationarity of $\{\bm{u}_{t}\}$ implies that the block elements of $\bm{\mathcal{M}}_{\bm{u}}$ being far below the main diagonal are small. This suggests a banding approach in which small elements are replaced by zeros. More specifically, we construct a Banded Inverse Autocovariance Matrix (BIAM) as

[TABLE]

where $1\leq q\ll T$ is called the banding parameter, $\bm{\mathcal{S}}_{\bm{u}}(q)=\operatorname{diag}\Big{(}\bm{S}(0),\bm{S}(1),\ldots,\bm{S}(q),\ldots,\bm{S}(q)\Big{)}$ and $\bm{\mathcal{M}}_{\bm{u}}(q)=\left[\bm{m}_{\bm{u}}^{ij}(q)\right]_{1\leq i,j\leq T}$ with

[TABLE]

Example 1

Consider a stationary $n$ -dimensional VAR( $3$ ) process specified as $\bm{u}_{t}=\sum_{j=1}^{3}\bm{A}_{j}\bm{u}_{t-j}+\bm{\eta}_{t}$ with $\bm{\eta}_{t}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}(\bm{0},\bm{\varSigma}_{\eta\eta})$ . For $T=4$ , the MCBD $\bm{\varSigma}_{\bm{u}}^{-1}=\bm{\mathcal{M}}_{\bm{u}}^{\prime}\bm{\mathcal{S}}_{\bm{u}}^{-1}\bm{\mathcal{M}}_{\bm{u}}$ is based on

[TABLE]

Alternatively, with banding parameter $q=2$ , the related banded inverse autocovariance matrix is $\bm{\varSigma}_{\bm{u}}^{-1}(2)=\bm{\mathcal{M}}_{\bm{u}}^{\prime}(2)\bm{\mathcal{S}}_{\bm{u}}^{-1}(2)\bm{\mathcal{M}}_{\bm{u}}(2)$ with

[TABLE]

The model of (2.1) can be stacked over time to yield the representation $\bm{y}=\bm{Z}\bm{\beta}+\bm{u}$ with $\bm{y}=[\bm{y}_{1}^{\prime},\bm{y}_{2}^{\prime},\ldots,\bm{y}_{T}^{\prime}]^{\prime}$ , $\bm{Z}=[\bm{Z}_{1},\bm{Z}_{2},\ldots,\bm{Z}_{T}]^{\prime}$ and $\bm{u}$ as before. For the moment, we will assume $\bm{\varSigma}_{\bm{u}}^{-1}(q)$ to be known and focus on the following estimator:

[TABLE]

A discussion on the properties of this infeasible estimator is informative because: (1) the incurred estimation error of an appropriately constructed estimator $\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q)$ will be asymptotically negligible, and (2) we can suppress the effect of banding by letting $q$ increase with sample size.

Two remarks related to $\widehat{\bm{\beta}}_{GLS}$ are instructive. First, the GLS estimator differs from the usual least squares estimator $\widehat{\bm{\beta}}_{OLS}:=\left(\bm{Z}^{\prime}\bm{Z}\right)^{-1}\bm{Z}^{\prime}\bm{y}$ by a weighing with the inverse covariance matrix $\bm{\varSigma}_{\bm{u}}^{-1}(q)$ . It is well documented in standard econometric textbooks (e.g. chapter 7 of Davidson and MacKinnon (2004)) that this weighing may lead to substantial efficiency gains. Second, it is illustrative to substitute the Modified Cholesky Decomposition of $\bm{\varSigma}_{\bm{u}}^{-1}(q)$ into the definition of this infeasible GLS estimator. The result is $\widehat{\bm{\beta}}_{GLS}=(\bm{Z}_{filt}^{\prime}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\bm{Z}_{filt})^{-1}\bm{Z}_{filt}^{\prime}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\bm{y}_{filt}$ where $\bm{Z}_{filt}=\bm{\mathcal{M}}_{\bm{u}}(q)\bm{Z}$ , and $\bm{y}_{filt}=\bm{\mathcal{M}}_{\bm{u}}(q)\bm{y}$ . The premultiplications by $\bm{\mathcal{M}}_{\bm{u}}(q)$ have the effect of filtering and take care of serial correlation. $\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)$ applies scaling and rotation to account for the correlations between the series. The following univariate autoregressive setting exemplifies this intuition.

Example 2

A regression model $y_{t}=\beta t+u_{t}$ has AR( $1$ ) innovations $u_{t}=\rho u_{t-1}+\eta_{t}$ where $\eta_{t}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}(0,\sigma^{2})$ and $|\rho|<1$ . Taking $n=1$ , the expressions of Example 1 are easily adapted to yield:

[TABLE]

and a similar transformation for the linear trend. The implied GLS estimator coincides with the estimator from Prais and Winsten (1954).

3 Asymptotic Theory

In this section, we present the asymptotic results. More specifically, we derive: (1) the limiting distribution of the GLS estimator, (2) the fully modified GLS (FM-GLS) estimator that corrects for second order bias terms, (3) a Wald test statistic, and (4) several multivariate KPSS-type of tests for the null of cointegration. We will also compare this FM-GLS estimator with the two fully modified estimators defined in Proposition 1 of Wagner et al. (2020). The following assumption will facilitate the development of the asymptotic theory.

Assumption 1 (Innovation Processes)

The innovations processes in the model satisfy the following assumptions:

(a)

The process $\bm{\zeta}_{t}=[\bm{\eta}_{t}^{\prime},\bm{\varepsilon}_{t}^{\prime}]^{\prime}$ is an independent and identically distributed (i.i.d.) sequence with $\mathbb{E}(\bm{\zeta}_{t}\bm{\zeta}_{t}^{\prime})=\left[\begin{smallmatrix}\bm{\varSigma}_{\eta\eta}&\bm{\varSigma}_{\eta\epsilon}\\ \bm{\varSigma}_{\epsilon\eta}&\bm{\varSigma}_{\epsilon\epsilon}\end{smallmatrix}\right]\succ 0$ and $\mathbb{E}(\|\bm{\zeta}_{t}\|^{2r})\leq C_{r}<\infty$ for some constant $C_{r}>0$ and some $r>2$ . 2. (b)

$\det\big{(}\bm{\mathcal{A}}(z)\big{)}\neq 0$ * for all $|z|\leq 1$ and $\sum_{j=0}^{\infty}j\|\bm{A}_{j}\|_{\mathcal{F}}<\infty$ .* 3. (c)

$\Delta\bm{x}_{t}=\bm{v}_{t}$ * admits the VAR( $\infty$ ) process $\bm{\mathcal{D}}(L)\bm{v}_{t}=\bm{\varepsilon}_{t}$ , where $\bm{\mathcal{D}}(L)=\bm{I}_{n}-\sum_{j=1}^{\infty}\bm{D}_{j}L^{j}$ . Moreover, $\det\big{(}\bm{\mathcal{D}}(z)\big{)}\neq 0$ for all $|z|\leq 1$ and $\sum_{j=0}^{\infty}j\,\|\bm{D}_{j}\|_{\mathcal{F}}<\infty$ .*

The stationary VAR( $\infty$ ) specifications for $\{\bm{u}_{t}\}$ and $\{\bm{v}_{t}\}$ are natural given the linear minimum MSE predictor formulae that underly the definitions of the MCBD and BIAM. Moreover, the conditions in Assumption 1 ensure that the lag polynomials $\bm{\mathcal{A}}(L)$ and $\bm{\mathcal{D}}(L)$ are invertible (see for example Theorem 7.4.2 of Hannan and Deistler (2012)), thereby showing that our Assumption 1 is similar to the linear processes assumptions that are regularly adopted in the literature on nonlinear cointegration, cf. Choi and Saikkonen (2010), Wagner and Hong (2016), and Wagner et al. (2020). The assumption $\det\big{(}\bm{\mathcal{D}}(1)\big{)}\neq 0$ rules out cointegration among the components of $\{\bm{x}_{t}\}$ .

Under Assumption 1(a), an invariance principle holds for $\bm{\zeta}_{t}$ , i.e. $\frac{1}{T^{1/2}}\sum_{t=1}^{[rT]}\bm{\zeta}_{t}\Rightarrow\bm{B}_{\bm{\zeta}}(r)\equiv\left[\begin{smallmatrix}\bm{B}_{\eta}(r)\\ \bm{B}_{\epsilon}(r)\end{smallmatrix}\right]$ where $\bm{B}_{\bm{\zeta}}$ denotes an $2n$ -dimensional Brownian motion with covariance matrix $\left[\begin{smallmatrix}\bm{\varSigma}_{\eta\eta}&\bm{\varSigma}_{\eta\epsilon}\\ \bm{\varSigma}_{\epsilon\eta}&\bm{\varSigma}_{\epsilon\epsilon}\end{smallmatrix}\right]$ . Moreover, Assumptions 1(b)-(c) justify the use of the Beveridge-Nelson decomposition (Phillips and Solo (1992)). A functional central limit theorem for linear processes is thus also applicable to $\bm{\xi}_{t}=[\bm{u}_{t}^{\prime},\bm{v}_{t}^{\prime}]^{\prime}$ , that is

[TABLE]

where the Brownian motion $\bm{B}_{\bm{\xi}}(r)$ of dimension $2n$ has covariance matrix

[TABLE]

Apart from this long-run covariance matrix $\bm{\varOmega}=\sum_{h=-\infty}^{\infty}\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}$ , we also introduce the one-sided long-run covariance matrix $\bm{\varDelta}=\left[\begin{smallmatrix}\bm{\varDelta}_{uu}&\bm{\varDelta}_{uv}\\ \bm{\varDelta}_{vu}&\bm{\varDelta}_{vv}\end{smallmatrix}\right]=\sum_{h=0}^{\infty}\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}$ . The Brownian motion defined by $\bm{B}_{u.v}=\bm{B}_{u}-\bm{\varOmega}_{uv}\bm{\varOmega}_{vv}^{-1}\bm{B}_{v}$ is by construction orthogonal to $\bm{B}_{v}$ . Its $(n\times n)$ covariance matrix equals $\bm{\varOmega}_{u.v}=\bm{\varOmega}_{uu}-\bm{\varOmega}_{uv}\bm{\varOmega}_{vv}^{-1}\bm{\varOmega}_{vu}$ .

3.1 Infeasible GLS

We start our analysis assuming that the $(nT\times nT)$ covariance matrix $\bm{\varSigma}_{\bm{u}}(q)$ is a known quantity for each $q$ . The modified Cholesky block decomposition of page 2.2 can now be used to derive the limiting distribution of this infeasible GLS estimator. A insightful exposition of our results requires further notation.

(a)

Introduce scaling matrices: $\bm{G}_{\bm{d}_{i},T}:=T^{-1/2}\operatorname{diag}[1,T^{-1},\ldots,T^{-d_{i}}]$ for the time trends, and $\bm{G}_{\bm{s}_{i},T}:=T^{-1/2}\operatorname{diag}[T^{-1/2},T^{-1},\dots,T^{-s_{i}/2}]$ for the stochastic trends. Moreover, we define $\bm{G}_{T}:=\operatorname{diag}\left[\bm{G}_{1,T},\dots,\bm{G}_{n,T}\right]$ , where $\bm{G}_{i,T}:=\operatorname{diag}\left[\bm{G}_{\bm{d}_{i},T},\bm{G}_{\bm{s}_{i},T}\right]$ . 2. (b)

Let $\bm{d}_{i}(r):=\left[1,r,\dots,r^{d_{i}}\right]^{\prime}$ , $\bm{B}_{s_{i}}(r):=\left[B_{v_{i}}(r),B_{v_{i}}^{2}(r),\dots,B_{v_{i}}^{s_{i}}(r)\right]^{\prime}$ and $\bm{j}_{i}(r):=\left[\bm{d}_{i}(r)^{\prime},\bm{B}_{s_{i}}(r)^{\prime}\right]^{\prime}$ . Define $d\times n$ block-diagonal random matrix $\bm{J}(r):=\operatorname{diag}\left[\bm{j}_{1}(r),\dots,\bm{j}_{n}(r)\right]$ . 3. (c)

$\bm{b}_{i}=\Big{[}\bm{0}_{d_{i}+1}^{\prime},1,2\int_{0}^{1}B_{v_{i}}(r)dr,\dots,s_{i}\int_{0}^{1}B_{v_{i}}^{s_{i}-1}(r)dr\Big{]}^{\prime}$ .

Finally, we use $\bm{B}_{v_{j}}$ as shorthand notation for the $j^{th}$ component of $\bm{B}_{v}$ .

Theorem 1 (Limiting Distribution of the infeasible GLS Estimator)

If Assumption 1 holds, and if $q=q(T)$ satisfies $\frac{1}{q}+\frac{q}{T}\to 0$ as $T\to\infty$ , then

[TABLE]

where ${\bm{\mathcal{B}}}=\left[{\bm{\mathcal{B}}}_{1}^{\prime},\dots,{\bm{\mathcal{B}}}_{n}^{\prime}\right]^{\prime}$ and ${\bm{\mathcal{B}}}_{i}=\operatorname{row}_{i}\Big{(}\bm{\varSigma}_{\epsilon\eta}\Big{)}\operatorname{col}_{i}\Big{(}\bm{\varSigma}_{\eta\eta}^{-1}\Big{)}\leavevmode\nobreak\ \bm{b}_{i}$ .

The limiting result in (3.3) coincides with the limiting distribution of the MSUR estimator, $\widetilde{\bm{\beta}}_{MSUR}:=\left(\bm{Z}^{\prime}(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1})\bm{Z}\right)^{-1}\Big{(}\bm{Z}^{\prime}(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1})\bm{y}\Big{)}$ , as reported in Wagner et al. (2020), see their Proof of Proposition 1. The equivalence of these limiting distributions is caused by the facts that: (1) applying a linear filter to an integrated series only affect its long-run variance (e.g. Phillips and Park (1988)), and (2) the previous statement remaining true when applying a linear filter to higher integer powers of integrated series. The terms $\int_{0}^{1}\bm{J}(r)\bm{\varOmega}_{uu}^{-1}\bm{\varOmega}_{uv}\bm{\varOmega}_{vv}^{-1}d\bm{B}_{v}(r)$ and ${\bm{\mathcal{B}}}$ in (3.3) reflect the presence of second order bias terms caused by serial correlation and endogeneity. In Section 3.3, we introduce the fully modified (FM) correction that adjust these bias terms and leads to standard inference. We first introduce a feasible version of the GLS estimator.

3.2 Consistent Estimation of $\bm{\varSigma}_{\bm{u}}^{-1}(q)$ and Feasible GLS

Up to this point we have discussed the infeasible estimator $\widehat{\bm{\beta}}_{GLS}:=\left(\bm{Z}^{\prime}\bm{\varSigma}_{\bm{u}}^{-1}(q)\bm{Z}\right)^{-1}\bm{Z}^{\prime}\bm{\varSigma}_{\bm{u}}^{-1}(q)\bm{y}$ . A feasible GLS approach requires a consistent estimator of the $(nT\times nT)$ matrix $\bm{\varSigma}_{\bm{u}}^{-1}(q)$ . Several authors, e.g. Wu and Pourahmadi (2009) and McMurry and Politis (2010), have constructed consistent estimators of large covariance matrices using banding or tapering to reduce the number of unknown parameters. Direct usage of their results poses two difficulties because: (1) numerical inversion of large matrices is computationally expensive for large $nT$ , and (2) matrix inversion might even be impossible because the estimated covariance matrix cannot be guaranteed to be positive definite. In the light of the such considerations, we will estimate $\bm{\varSigma}_{\bm{u}}^{-1}(q)$ directly and ensure it to be positive definite. The approach is the sample counterpart of the BIAM described on page 2. That is, we replace true innovations by first stage OLS residuals $\widehat{\bm{u}}_{t}=\bm{y}_{t}-\bm{Z}_{t}\widehat{\bm{\beta}}_{OLS}$ , and subsequently minimise a sample moment in estimated residuals rather than the population mean squared forecasting error. This method was previously used by Cheng et al. (2015) and Ing et al. (2016a) for univariate time series. For a multivariate time series, we define

[TABLE]

$1\leq\ell\leq q$ , and $\widehat{\bm{S}}(0)=\frac{1}{T}\sum_{t=1}^{T}\widehat{\bm{u}}_{t}\widehat{\bm{u}}_{t}^{\prime}$ . Similarly to (2.6)-(2.7), we subsequently construct the matrices $\widehat{\bm{\mathcal{M}}}_{\bm{u}}(q)=\left[\widehat{\bm{m}}_{\bm{u}}^{ij}(q)\right]_{1\leq i,j\leq T}$ and $\widehat{\bm{\mathcal{S}}}_{\bm{u}}(q)=\operatorname{diag}\left(\widehat{\bm{S}}(0),\widehat{\bm{S}}(1),\ldots,\widehat{\bm{S}}(q),\ldots,\widehat{\bm{S}}(q)\right)$ , and obtain the implied multivariate BIAM estimator as

[TABLE]

Assumption 2

For $\widehat{\bm{u}}=[\widehat{\bm{u}}_{1}^{\prime},\ldots,\widehat{\bm{u}}_{T}^{\prime}]^{\prime}$ and $\bm{u}=[\bm{u}_{1}^{\prime},\ldots,\bm{u}_{T}^{\prime}]^{\prime}$ , assume $\|\widehat{\bm{u}}-\bm{u}\|^{2}=O_{p}(1)$ .

Assumption 3

Assume $q=q_{T}$ satisfies $\frac{1}{q_{T}}+\frac{q_{T}^{3}}{T}\to 0$ as $T\to\infty$ .

Assumption 2 requires the residuals to be sufficiently close to the true innovations. It is a rather mild assumption and it is satisfied if residuals are computed by least squares. Assumption 3 places constraints on the banding parameter $q_{T}$ . First, Assumption 3 requires the banding parameter to diverge with sample size. This ensures that no nonzero elements are (asymptotically) set to zero. Moreover, the assumption $q_{T}^{3}/T\to 0$ establishes an upper bound for the growth rate of $q_{T}$ . The definition of $\widehat{\bm{A}}(\ell)$ , see (3.4), shows that we are fitting a vector autoregression (VAR) of increasing lag order to the residuals. Identical rate requirements are reported by Lewis and Reinsel (1985) when they derive consistency and asymptotic normality results when finite VAR models are fitted to infinite order VAR processes. The following theorem shows the consistent estimation of $\bm{\varSigma}_{\bm{u}}^{-1}$ and implies that the infeasible and feasible GLS estimator have the same limiting distribution.

Theorem 2 (Consistent Estimation of $\bm{\varSigma}_{\bm{u}}^{-1}$ )

If Assumptions 1-3 hold, then

[TABLE]

3.3 Fully Modified Inference

The asymptotic results of Theorem 1 is not immediately useful for statistical inference. There are two difficulties. First, the second order bias dislocates the limiting distribution which can translate into substantial finite sample bias. This leads to a loss in efficiency. Second, possible dependencies between the Brownian motions $\bm{B}_{u}$ and $\bm{B}_{v}$ cause the limiting distribution to depend on nuisance parameters. Critical values would therefore be nuisance parameter dependent as well.

These two issues have received extensive attention in the linear cointegration literature. A (non-exhaustive) list of solution methods is: joint modeling as in Phillips (1991), Saikkonen’s (1992) dynamic least squares, and the integrated modified OLS and fixed-b approaches by Vogelsang and Wagner (2014). We rely on the fully modified (FM) approach advocated by Phillips and Hansen (1990) and Phillips (1995). The idea is a twofold modification of the estimator: (1) second order bias terms are subtracted, and (2) a transformation of the dependent variable is introduced to obtain a zero-mean Gaussian mixture limiting distribution. Recently, Wagner et al. (2020) have proposed two estimators within the framework of seemingly unrelated cointegrating polynomial regressions. These estimators, FM-SOLS and FM-SUR, rely on kernel estimators of the one- and two-sided long-run covariance matrix (see Theorem 3). As such, we introduce the following assumption.

Assumption 4 (Consistent Estimation of Long-run Covariance Matrices)

$\widehat{\bm{\varOmega}}$ * and $\widehat{\bm{\varDelta}}$ are consistent kernel estimators of the long-run covariance matrix $\bm{\varOmega}$ and the one-sided long-run covariance matrix $\bm{\varDelta}$ , respectively.*

Andrews (1991) and Newey and West (1994) use kernel estimators for long-run covariance estimation. Their method involves the calculation of weighted sums of the autocovariance matrices of the residuals. These weights are determined by a kernel function and bandwidth parameter. Our Assumption 4 is easily satisfied by imposing suitable conditions on the kernel function and bandwidth parameter. We refer to Phillips (1995) and Jansson (2002) for an enumeration of such conditions.

Alternatively, we can obtain consistent one- and two-sided long-run covariance estimators within the BIAM framework of Section 3.2.333An overview of the procedure is given here. Section S2 in the Supplement provides further details. This approach resembles Berk (1974). The GLS estimator and its FM counterpart are thus constructed within a single framework. The estimators are as follows. For all $t=1,2,\ldots,T$ , we first stack $\widehat{\bm{u}}_{t}$ and $\Delta\bm{x}_{t}=\bm{v}_{t}$ in the $2n$ -dimensional vector $\widehat{\bm{\xi}}_{t}=[\widehat{\bm{u}}_{t}^{\prime},\Delta\bm{x}_{t}^{\prime}]^{\prime}$ . Since the BIAM estimator is fitting VAR processes up to order $q_{T}$ , we will use the estimated VAR( $q_{T}$ ) approximations to define the long-run covariance estimators. For $\bm{\varOmega}$ , the estimator is $\widehat{\bm{\varOmega}}_{q_{T}}=\left(\bm{I}_{2n}-\sum_{j=1}^{q_{T}}\widehat{\bm{F}}_{j}(q_{T})\right)^{-1}\widehat{\bm{\varSigma}}_{q_{T}}\left(\bm{I}_{2n}-\sum_{j=1}^{q_{T}}\widehat{\bm{F}}_{j}^{\prime}(q_{T})\right)^{-1}$ , where $\widehat{\bm{\varSigma}}_{q_{T}}=\widehat{\bm{S}}(q_{T})$ and $\widehat{\bm{F}}_{j}(q_{T})$ denote respectively the estimated prediction error variance and the coefficient matrix of the $j^{th}$ lag when a VAR( $q_{T}$ ) is fitted to $\{\widehat{\bm{\xi}}_{t}\}_{t=1}^{T}$ . The population one-sided long-run covariance matrix is $\bm{\varDelta}=\sum_{h=0}^{\infty}\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}$ . It is thus intuitive to approximate this quantity by a finite sum of estimated covariance matrices of $\{\widehat{\bm{\xi}}_{t}\}_{t=1}^{T}$ . These covariance matrices are nothing but subblocks of the matrix $\widehat{\bm{\varSigma}}_{\bm{\xi}}(q_{T})=\widehat{\bm{\mathcal{M}}}_{\bm{\xi}}^{-1}(q_{T})\widehat{\bm{\mathcal{S}}}_{\bm{\xi}}(q_{T})\widehat{\bm{\mathcal{M}}}_{\bm{\xi}}^{-1\prime}(q_{T})$ .444We use $\bm{\varSigma}_{\bm{\xi}}$ to denote the $(2nT\times 2nT)$ matrix $\mathbb{E}(\bm{\xi}\bm{\xi}^{\prime})$ where $\bm{\xi}=[\bm{\xi}_{1}^{\prime},\bm{\xi}_{2}^{\prime},\ldots,\bm{\xi}_{T}^{\prime}]^{\prime}$ . The matrices $\widehat{\bm{\mathcal{M}}}_{\bm{\xi}}(q)$ and $\widehat{\bm{\mathcal{S}}}_{\bm{\xi}}(q)$ are defined similarly to respectively $\widehat{\bm{\mathcal{M}}}_{\bm{u}}(q)$ and $\widehat{\bm{\mathcal{S}}}_{\bm{u}}(q)$ (see page 3.5). The matrix $\widehat{\bm{\mathcal{M}}}_{\bm{\xi}}(q_{T})$ is lower triangular with identity matrices on the main diagonal. Therefore, its matrix inverse exists and is fast to compute. We therefore use

[TABLE]

where $\bm{Q}_{r}=\left[\mathbf{O}_{2n\times 2n},\cdots,\mathbf{O}_{2n\times 2n},\bm{I}_{2n},\cdots,\bm{I}_{2n}\right]^{\prime}$ is an $\big{(}2nT\times 2n\big{)}$ block matrix of zeros of which the last $r$ blocks have been replaced by identity matrices. To ensure consistency, we place the following rate restriction on the number of included autocovariance matrices.

Assumption 5

As $T\rightarrow\infty$ , $r_{T}\rightarrow\infty$ , $\frac{r_{T}q_{T}^{3}}{T}\rightarrow 0$ , and $r_{T}=O(q_{T})$ .

Definitions and limiting results for FM estimators are presented in Theorem 3. The FM-SOLS, FM-SUR and FM-GLS estimator all depend on estimators for $\bm{\varDelta}$ and $\bm{\varOmega}$ . It is only the consistency of these estimators that is relevant for the asymptotic analysis, not whether the kernel or BIAM approach is employed. As such, we will not complicate notation by introducing additional notation to indicate whether the kernel or BIAM approach is used. In subsequent theorems, simulation results and the empirical application we will use kernel estimators for FM-SOLS and FM-SUR, and the BIAM approach for FM-GLS. This seems to be the logical choice for these estimators.

Theorem 3

For $i=1,\ldots,n$ , define $\widehat{\bm{b}}_{i}=\left[\bm{0}_{d_{i}+1}^{\prime},T,2\sum_{t=1}^{T}x_{it},\ldots,s_{i}\sum_{t=1}^{T}x_{it}^{s_{i}-1}\right]^{\prime}$ , for $i=1,\ldots,n$ . Also, define the $(n\times n)$ matrix $\widehat{\bm{\varDelta}}_{vu}^{+}$ as the (implied) consistent estimator of $\bm{\varDelta}_{vu}^{+}=\bm{\varDelta}_{vu}-\bm{\varDelta}_{vv}\bm{\varOmega}_{vv}^{-1}\bm{\varOmega}_{vu}$ .

(a)

Define the FM-SOLS estimator as

[TABLE]

where $\bm{y}^{+}:=[\bm{y}_{1}^{+\prime},\bm{y}_{2}^{+\prime},\ldots,\bm{y}_{T}^{+\prime}]^{\prime}$ with $\bm{y}_{t}^{+}=\bm{y}_{t}-\widehat{\bm{\varOmega}}_{uv}\widehat{\bm{\varOmega}}_{vv}^{-1}\Delta\bm{x}_{t}$ , and $\widehat{\bm{A}}:=[\widehat{\bm{A}}_{1}^{\prime},\ldots,\widehat{\bm{A}}_{n}^{\prime}]^{\prime}$ with $\widehat{\bm{A}}_{i}=\widehat{\bm{\varDelta}}_{v_{i}u_{i}}^{+}\widehat{\bm{b}}_{i}$ and $\widehat{\bm{\varDelta}}_{v_{i}u_{i}}^{+}$ being the $i^{th}$ element on the main diagonal of $\widehat{\bm{\varDelta}}_{vu}^{+}$ . If Assumptions 1 and 4 hold, then

[TABLE] 2. (b)

Define the FM-SUR estimator as

[TABLE]

where $\widetilde{\bm{A}}^{*}:=[\widetilde{\bm{A}}_{1}^{*},\ldots,\widetilde{\bm{A}}_{n}^{*}]$ with $\widetilde{\bm{A}}_{i}^{*}=\operatorname{row}_{i}\left(\widehat{\bm{\varDelta}}_{vu}^{+}\right)\operatorname{col}_{i}\left(\widehat{\bm{\varOmega}}_{u.v}^{-1}\right)\widehat{\bm{b}}_{i}$ . If Assumptions 1 and 4 hold, then

[TABLE] 3. (c)

Define the FM-GLS estimator as

[TABLE]

where $\bm{v}:=[\Delta\bm{x}_{1}^{\prime},\ldots,\Delta\bm{x}_{T}^{\prime}]^{\prime}=[\bm{v}_{1}^{\prime},\ldots,\bm{v}_{T}^{\prime}]^{\prime}$ , and $\widehat{{\bm{\mathcal{B}}}}^{+}=\big{[}\widehat{{\bm{\mathcal{B}}}}_{1}^{+\prime},\dots,\widehat{{\bm{\mathcal{B}}}}_{n}^{+\prime}\big{]}^{\prime}$ with

[TABLE]

If Assumptions 1-3 and 5 hold, then

[TABLE]

The FM-GLS estimator is new to the seemingly unrelated CPR literature, whereas the FM-SOLS and FM-SUR estimators have recently appeared in Wagner et al. (2020). Theorem 3 indicates that all three estimators have a zero-mean Gaussian mixture limiting distribution implying that standard inference is applicable for each. However, we also see from Theorem 3 that the limiting distributions are generally different because different types of weighing are used in the construction of the estimators.555There are special cases in which some (pairs of) estimators become asymptotically equivalent. For example, if $n=1$ , then all estimators are asymptotically equivalent because the weighting matrices $\bm{\varOmega}_{u.v}^{-1}$ and $\bm{\varOmega}_{uu}^{-1}$ are now scalars. Also, under exogeneity, we have $\bm{\varOmega}_{uu}=\bm{\varOmega}_{u.v}$ and the FM-SUR an FM-GLS estimators share the same limiting distribution.

For completeness, we also detail how the FM-GLS estimator can be used to test linear hypotheses. A formal presentation of such a result is more involved because of the different convergence rates of the individual parameter estimators. That is, the parameters with the lowest convergence rate will dominate the asymptotic distribution and one should take care to avoid a degenerate limiting distribution. We will rule out such complications by considering hypothesis tests on individual parameters.666For general linear hypothesis, we refer the reader to Sims et al. (1990) where a reordering based on convergence rates is used to establish the limiting distribution of the Wald $F$ statistic for general linear hypothesis. The same approach is applicable in our setting but we will not explore this in greater detail. Therefore, let $\bm{R}$ denote a $(k\times s)$ selection matrix in which every row contains a single 1 and zeros otherwise. The null hypothesis $\bm{R}\bm{\beta}=\bm{r}$ can be tested using the standard chi-squared limiting distribution of the Wald statistic (Theorem 4). These tests are practically relevant. For example, exclusion restrictions of the type $\bm{R}\bm{\beta}=\bm{0}$ allow us to test whether the cointegrating relation is linear.

Theorem 4

Consider the null hypothesis $H_{0}:\bm{R}\bm{\beta}=\bm{r}$ , which imposes $k$ linearly independent restrictions. Under the assumptions of Theorem 3(c), the Wald test statistic

[TABLE]

where $\widehat{\bm{\varPhi}}=\bm{R}\left[\bm{Z}^{\prime}\left(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1}\right)\bm{Z}\right]^{-1}\left[\bm{Z}^{\prime}\Big{(}\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1}\widehat{\bm{\varOmega}}_{u.v}\widehat{\bm{\varOmega}}_{uu}^{-1}\Big{)}\bm{Z}\right]\left[\bm{Z}^{\prime}\left(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1}\right)\bm{Z}\right]^{-1}\bm{R}^{\prime}$ .

3.4 Testing the Null of Cointegration

Stationarity tests are used to avoid spurious regressions and to verify the correct specification of the cointegrating relation. To test for stationarity of the seemingly unrelated cointegrating polynomial regressions (SUCPR) errors, we combine the test statistic from Nyblom and Harvey (2000) with the sub-sampling approach found in Choi and Saikkonen (2010) and Wagner and Hong (2016). We consider three test statistics. To treat all test statistics in a unified framework, we define

[TABLE]

that is, a vector of length $nb$ stacking the cumulative sums of $\{\bm{x}_{j},\ldots,\bm{x}_{j+b-1}\}$ . If the true innovations $\bm{u}_{1},\ldots,\bm{u}_{T}$ were observed, then we could use the full-sample KPSS-type of test statistic $\frac{1}{T^{2}}\bm{\varphi}_{1,T}(\{\bm{u}\})^{\prime}(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1})\bm{\varphi}_{1,T}(\{\bm{u}\})=\operatorname{tr}\left[\widehat{\bm{\varOmega}}_{uu}^{-1}\frac{1}{T^{2}}\sum_{t=1}^{T}\left(\sum_{s=1}^{t}\bm{u}_{s}\right)\left(\sum_{s=1}^{t}\bm{u}_{s}\right)^{\prime}\right]$ to test for stationarity of the innovations. Under the null of stationarity, this test statistic would converge weakly to $\int_{0}^{1}\|\bm{W}(r)\|^{2}dr$ with $\bm{W}(r)$ denoting an $n$ -dimensional standard Brownian motion. This limiting distribution is free of nuisance parameters and the cumulative distribution function is available as a series expansion (see the Supplement).

The innovations $\bm{u}_{1},\ldots,\bm{u}_{T}$ are only available when cointegrating relations are pre-specified. If these coefficients are estimated, then this additional parameter uncertainty will contaminate the limiting distribution with nuisance parameters.777There are exceptions. Shin (1994) reports a nuisance parameter free limiting distribution for a single-equation linear cointegrating relation. This remains true if only a single integrated variable enters the cointegrating regression with a higher power, see Proposition 5 in Wagner and Hong (2016). The idea behind the subsampling approach is to construct a test statistic incorporating $b=b_{T}$ residuals while computing parameter estimators from all $T$ observations. If $b_{T}$ increases slowly with sample size, then the parameter estimation error will be negligible relative to the randomness in the errors and the asymptotic distribution remains $\int_{0}^{1}\|\bm{W}(r)\|^{2}dr$ .

The three KPSS-type of test are based on the following residuals: $\hat{\bm{u}}_{t,SOLS}^{+}=\bm{y}_{t}^{+}-\bm{Z}_{t}\widehat{\bm{\beta}}_{SOLS}^{+}$ , $\hat{\bm{u}}_{t,SUR}^{+}=\bm{y}_{t}^{+}-\bm{Z}_{t}\widehat{\bm{\beta}}_{SUR}^{+}$ , and $\hat{\bm{u}}_{t,FGLS}=\bm{y}_{t}-\bm{Z}_{t}\widehat{\bm{\beta}}_{FGLS}^{+}$ . The test statistic are:

[TABLE]

and

[TABLE]

where $\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})$ is the $(nb_{T}\times nb_{T})$ submatrix of $\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T})$ obtained by selecting the rows and columns related to all time indices in the set $\{n(T-b_{T})+1,n(T-b_{T})+2,\ldots,nT\}$ . The test statistic in (3.17) fits naturally into the FM-GLS estimation framework.

Theorem 5

Let the assumptions from Theorem 3 hold.

(a)

If $\frac{1}{b_{T}}+\frac{b_{T}}{T}\to 0$ as $T\to\infty$ , then

[TABLE] 2. (b)

If $\frac{q_{T}}{b_{T}}+\frac{b_{T}}{T}\to 0$ as $T\to\infty$ , then $K_{j,b_{T}}^{BIAM}\Rightarrow\int_{0}^{1}\|\bm{W}(r)\|^{2}dr$ for any $1\leq j\leq T-b_{T}+1$ .

A sample of size $T$ allows for up to $M=\lfloor T/b_{T}\rfloor$ series of nonoverlapping blocks of residuals of length $b_{T}$ . Similarly to Choi and Saikkonen (2010), we apply the Bonferroni procedure to use all these series and thereby increase power. The approach is applicable to any of the three test statistics in Theorem 5. As such, we keep the notation general and use a generic $K_{j}$ to denote a test statistic based on the $j^{th}$ subseries, $j=1,2,\ldots,M$ . In the Bonferrroni procedure we compute $K_{max}=\left\{K_{1},K_{2},\ldots,K_{M}\right\}$ and do not reject the null hypothesis whenever $K_{max}\leq c_{\alpha/M}$ with $c_{\alpha/M}$ defined by $\mathbb{P}\left(\int_{0}^{1}\|\bm{W}(r)\|^{2}dr\geq c_{\alpha/M}\right)=\alpha/M$ . The Bonferroni inequality implies $\lim_{T\to\infty}\mathbb{P}\left(K_{max}\leq c_{\alpha/M}\right)\geq 1-\lim_{T\to\infty}\sum_{j=1}^{M}\mathbb{P}\left(K_{j}>c_{\alpha/M}\right)=1-\alpha$ and we see that the probability of a type-I error does not exceed the significance level $\alpha$ .

Remark 1

We suggest to follow Choi and Saikkonen (2010) in terms of the implementation of the subsampling approach. That is, the block size $b_{T}$ is selected using the minimum volatility rule by Romano and Wolf (2001). For this particular block size we subsequently select subsamples by taking non-overlapping blocks from alternatively the start and the end of the sample.

Remark 2

The limiting results in Theorem 5 guarantee a correct asymptotic size. Our simulations show (1) that these tests have power against various alternative hypotheses and (2) that power increases with sample size. A theoretical investigation of the power properties is outside of the scope of this paper.

4 Simulations

We now study the finite sample performance of the estimators and stationarity tests. First, we compare the FM-GLS estimator with the FM-SOLS and FM-SUR estimators from Wagner et al. (2020). All long-run covariance matrices are computed using a Bartlett kernel and the automatic bandwidth selection approach due to Andrews (1991). For FM-GLS, the banding parameter $q_{T}$ is selected using the subsampling and risk-minimization approach explained in section 5 from Bickel and Levina (2008).888More details concerning the implementation can be found in the Supplement. Infeasible counterparts of the estimator are constructed assuming the knowledge of the true serial correlation and/or cross-sectional dependence pattern. These estimators are denoted by infSOLS, infSUR, and infGLS. Second, we look at the cointegration tests. We consider three test statistics: $K^{SOLS}$ and $K^{SUR}$ use the residuals as in (3.16), whereas $K^{BIAM}$ employs the pre-filtered residuals from (3.17). All tests are implemented with minimum volatility block size selection and Bonferroni correction.

We consider $T\in\{100,200,500\}$ and $n\in\{3,5\}$ . All tests are performed at a nominal significant level of $5\%$ . For stationary processes, a presample of 200 observations is used to remove the influence of the starting values. All results are based on $2.5\times 10^{4}$ Monte Carlo replicates.

4.1 Monte Carlo Designs

We generate data according to a quadratic seemingly unrelated CPR. That is, we adopt the DGP in (2.1) with $\bm{z}_{it}=\big{[}1,t,x_{it},x_{it}^{2}\big{]}^{\prime}$ . The integrated variables satisfy $\bm{x}_{0}=\bm{0}$ and $\Delta\bm{x}_{t}=\bm{v}_{t}$ . We explore two error processes.

Setting A (Errors as in Wagner et al. (2020)): As a benchmark, we revisit the simulation setting in Wagner et al. (2020) and generate innovations according to

[TABLE]

where $\bm{\varepsilon}_{t}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}\big{(}\bm{0},\bm{\varSigma}(\rho_{3})\big{)}$ , $\bm{e}_{t}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}\big{(}\bm{0},\bm{\varSigma}(\rho_{4})\big{)}$ and

[TABLE]

is a symmetric Toeplitz matrix. The parameter $\rho_{1}$ controls the level of serial correlation and $\rho_{2}$ measures the degree of endogeneity. The parameters $\rho_{3}$ and $\rho_{4}$ indicate the extent of correlation across equations induced through $\bm{\varepsilon}_{t}$ and $\bm{e}_{t}$ , respectively. For simplicity, we assume identical values $\rho_{1}=\rho_{2}=\rho_{3}=\rho_{4}=\rho\in\{0,0.3,0.6,0.8\}$ . The true coefficient vector is $\bm{\beta}=\big{[}\bm{\beta}_{1}^{\prime},\dots,\bm{\beta}_{n}^{\prime}\big{]}^{\prime}$ , where $\bm{\beta}_{i}=[1,1,5,\beta_{i,4}]^{\prime}$ with $\beta_{i,4}=-0.3$ , $i=1,\dots,n$ .

Setting B (VARMA Errors): To further investigate the importance of serial correlation, we consider a second specification of the innovation process:

[TABLE]

where $\bm{\eta}_{t}$ and $\bm{\varepsilon}_{t}$ are generated as $\left[\begin{smallmatrix}\bm{\eta}_{t}\\ \bm{\varepsilon}_{t}\end{smallmatrix}\right]\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}\big{(}\bm{0},\bm{\varSigma}(\theta)\big{)}$ and $\bm{\varSigma}(\theta)\in\mathbb{R}^{2n\times 2n}$ as in (4.2) but with parameter $\theta$ . The matrices $\bm{\varLambda}_{i}$ ( $i=1,2,3$ ) are generated independently and similarly to Chang et al. (2004). That is, we take the following three steps:

(a)

Generate an $n\times n$ random matrix $\bm{U}_{i}$ from $\text{U}[0,1]$ and construct the orthogonal matrix $\bm{H}_{i}=\bm{U}_{i}\Big{(}\bm{U}_{i}^{\prime}\bm{U}_{i}\Big{)}^{-1/2}$ . 2. (b)

Generate $n$ eigenvalues $\lambda_{i1},\dots,\lambda_{in}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}\underline{\lambda},\bar{\lambda}\big{]}$ . 3. (c)

Let $\bm{L}_{i}=\operatorname{diag}\left(\lambda_{i1},\dots,\lambda_{in}\right)$ and compute $\bm{\varLambda}_{i}=\bm{H}_{i}\bm{L}_{i}\bm{H}_{i}^{\prime}$ .

The parameter $\theta\in\{0.3,0.5\}$ governs regressor-error correlation and cross-equation correlation. The amount of serial correlation is specified through $\underline{\lambda}$ and $\bar{\lambda}$ . The three scenarios $\big{(}\underline{\lambda},\bar{\lambda}\big{)}\in\big{\{}\left(0.1,0.5\right),\left(0.5,0.8\right),\left(0.8,0.95\right)\big{\}}$ steadily increase the autocorrelation in the generated data.

Setting C (Cointegration Tests): We continue to construct innovations according to Setting B. Moreover, we fix $\left[\begin{smallmatrix}\bm{\eta}_{t}\\ \bm{\varepsilon}_{t}\end{smallmatrix}\right]\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}\big{(}\bm{0},\bm{\varSigma}(\theta)\big{)}$ with $\theta=0.3$ , and we construct the matrices $\bm{\varLambda}_{2}$ and $\bm{\varLambda}_{3}$ using $\big{(}\underline{\lambda},\bar{\lambda}\big{)}=(0.1,0.5)$ . The eigenvalues of $\bm{\varLambda}_{1}$ are varied to explore both size and power properties. We always estimate a quadratic seemingly unrelated CPR.

Size DGP.

We generate the eigenvalues of $\bm{\varLambda}_{1}$ as before. That is, take $\lambda_{11},\dots,\lambda_{1n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}\underline{\lambda},\bar{\lambda}\big{]}$ , where $\big{(}\underline{\lambda},\bar{\lambda}\big{)}\in\left\{\left(0.1,0.5\right),\left(0.5,0.8\right),\left(0.8,0.95\right)\right\}$ .

Power DGP1.

We set $\lambda_{1j}=1$ for $1\leq j\leq J_{1}$ and generate $\lambda_{1j}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}0.1,0.5\big{]}$ for $J_{1}+1\leq j\leq n$ . The integer $J_{1}\in\{1,2,n\}$ represents the number of unit roots in $\{\bm{u}_{t}\}$ .

Power DGP2.

The eigenvalues of $\bm{\varLambda}_{1}$ are sampled as $\lambda_{11},\dots,\lambda_{1n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}0.1,0.5\big{]}$ , and the first $J_{2}\in\{1,2,n\}$ series follow a cubic SUCPR specification:

[TABLE]

Power DGP3.

We again take $\lambda_{11},\dots,\lambda_{1n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}0.1,0.5\big{]}$ and construct

[TABLE]

where $J_{3}\in\{1,2,n\}$ represents for the number of equations that specify a spurious relation.

Overall, the Power DGPs 1-3 consider: missing $I(1)$ regressors, omitted higher order powers of the $I(1)$ regressor $x_{it}$ , and spurious regressions, respectively.

4.2 Discussion of the Simulation Results

Tables 1 and 2 report the empirical mean squared error (MSE) for both feasible and infeasible estimators. As results are qualitatively similar across equations, we only report on the estimators for $\beta_{1,4}$ (the coefficient in front of $x_{1t}^{2}$ ). The column with FGLS contains the numerical value of the MSE and the MSEs of all other estimators are expressed relative to this benchmark. Values above 1 indicate a better performance of FM-GLS. We make the following observations:

(a)

The FM-GLS estimator generally has the lowest MSE among all feasible estimators. These efficiency gains are small at low levels of endogeneity and serial correlation, but become sizeable at higher levels. Moreover, the Monte Carlo outcomes for the infeasible estimators indicate that these gains remain when the estimators are informed about the true endogeneity and serial correlation properties. It is thus the GLS weighting of the data that improves estimation accuracy. 2. (b)

There is one particular instance in Table 2 in which the performance of the FM-GLS estimator has a high MSE, namely the case of high persistency $\big{(}\underline{\lambda},\bar{\lambda}\big{)}=(0.8,0.95)$ , high endogeneity $\theta=0.5$ , and small sample size $T=100$ . This is caused by an inaccurate BIAM estimator resulting from the combination of a small sample size, high endogeneity, and high persistency. The problem disappears when $T$ increases.

The subsequent set of simulations evolves around hypothesis testing, see Table 3 and Figures 1-6. The errors are simulated using Setting A and we use the following Wald-type test statistics: the Wald-SOLS and Wald-SUR tests as developed in Proposition 2 in Wagner et al. (2020), and the Wald-FGLS test from Theorem 4. We consider: (i) the single equation test $H_{0}:\beta_{1,4}=-0.3$ against the two-sided alternative $H_{1}:\beta_{1,4}\neq-0.3$ , and (ii) the joint test $H_{0}:\beta_{1,4}=\beta_{2,4}=\ldots=\beta_{n,4}=-0.3$ against the alternative which rejects when at least one coefficient is unequal to $-0.3$ . Some general remarks regarding size and size-corrected power are as follows.

(c)

The Wald tests are typically oversized but the three tests react differently to changes in $\rho$ . Increases in $\rho$ result in an increasing size for the SOLS and SUR version of the Wald test, whereas increases in $\rho$ lead to size decreases for the GLS type of Wald test. Overall, the GLS test provides better size control. 2. (d)

In Figures 1 and 2, we vary the serial correlation parameter $\rho_{1}$ and the endogeneity parameter $\rho_{2}$ separately. Overall, variation in $\rho_{1}$ has a larger influence on size with the SUR test being most sensitive and the GLS test being least sensitive. 3. (e)

For all three Wald-type of tests, the size of the tests improves with sample size $T$ . 4. (f)

The ordering in terms of size-corrected power is the same throughout Figures 3-6. That is, size-corrected power is lowest for Wald-SOLS, increases for Wald-SUR, and is highest for the Wald-FGLS test.

The simulation results for the KPSS-type of cointegration tests can be found in Table 4. The general conclusions are as follows.

(g)

The empirical sizes of the $K^{SOLS}$ and $K^{SUR}$ tests are similar. We see: very conservative results for low serial correlation, decent size for medium serial correlation, and strongly oversized tests at high levels of serial correlation. These findings are completely in line with the simulation results that are reported in table 3 of Choi and Saikkonen (2010). The same behaviour is observed for the $K^{BIAM}$ test but the deviations from the 5% level are less extreme. 2. (h)

The power of the $K^{FOLS}$ , $K^{SUR}$ , and $K^{BIAM}$ tests behaves as expected: (1) power always increases with sample size, and (2) power increases when more unit roots, more misspecified equations, or more spurious relationships are incorporated in the DGP. The $K^{BIAM}$ test has the lowest power among the three tests. This is caused by the fact that the filter can nearly difference the data and hence make it appear more stationary.

5 Empirical Application

The Environmental Kuznets Curve (EKC) conjectures an inverted U-shaped relation between environmental degradation and income per capita. That is, there is an initial decline in environmental quality with increasing economic activity, but beyond a certain turning point (caused by e.g. industrial transformation and increasing environmental awareness), economic growth goes hand in hand with environmental improvement. A more detailed description and historical overview of the EKC can be found in Stern (2004) and Stern (2017), respectively. The implications of further economic growth on pollution, e.g. the emission of greenhouse-gases, are also key in understanding the future of global warming (Nordhaus (2013)).

We builds upon and compare with Wagner et al. (2020). That is, we look at carbon dioxide $(\text{CO}_{2})$ emissions and GDP as proxies for environmental pollution and economic development (both per capita and in logarithms), respectively. The data is collected from the Maddison Project Database (MPD) and the homepage of the Carbon Dioxide Information Analysis Center (CDIAC).999The Maddison Project Database, Bolt et al. (2018), contains the data on population size and real GDP. The data on $\text{CO}_{2}$ originates from Boden et al. (2017). We follow the official guidelines and multiply by $3.667$ and $10^{6}$ to convert the reported fossil-fuel emissions into total carbon dioxide emissions. As in Wagner et al. (2020), we consider Austria (AT), Belgium (BE), Finland (FI), the Netherlands (NL), Switzerland (CH) and the United Kingdom (UK). Our yearly data spans the period from 1870 to 2014. We refer to the latter paper for a discussion of the stationarity properties of all series as well as the motivation for this particular set of countries. Overall, the dataset consist of $n=6$ countries with $T=145$ time series observations each. Such a panel with small $n$ and large $T$ is ideally suited for our FMGLS approach since the multivariate banded inverse autocovariance matrix remains computable.

We estimate the quadratic model specification:

[TABLE]

where $e_{it}$ and $g_{it}$ are $\text{CO}_{2}$ emissions and GDP, respectively. As the first step in our analysis we employ the multivariate stationarity tests of Section 3.4 to check this model specification (Table 5). All three tests reject the null of cointegration at a 5% level signalling inappropriateness of the quadratic formulation. Figure 7 shows the residuals on which these tests are based. What stands out in these graphs is the erratic behaviour of the series around the two world wars. Based on this fact, and to be able to compare to Wagner et al. (2020), we will continue the analysis using model (5.1) and the given collection of countries. Before doing so, it will be worthwhile to discuss the time series properties of these residuals.

We consider the series $\{\hat{\bm{u}}_{t,FGLS}\}$ in the remainder of this section but the other residuals series will provide qualitatively similar outcomes. When fitting the VAR( $p$ ) models with $1\leq p\leq 8$ to these residuals, the BIC information criterion selects a lag order of $p=1$ . The absolute eigenvalues of the estimated coefficient matrix are $(0.55,0.55,0.51,0.31,0.31,0.11)$ , and the estimate for the error correlation matrix is

[TABLE]

There is thus serial and cross-sectional correlation to be exploited by the FM-GLS estimator.

The FM-SOLS, FM-SUR and FM-GLS estimation results of Model (5.1) are reported in Table 6. An inspection of the coefficient estimates and their confidence intervals reveals that: (1) $\beta_{i,3}$ is positive for each country, (2) $\beta_{i,4}$ is negative for each country, and (3) all coefficients are significant at the 5% level. All these three facts are in line with the EKC hypothesis.101010This is non-surprising because Wagner et al. (2020) have selected the current set of countries because they display the EKC behaviour. Also, our estimation results are slightly different from those in Wagner et al. (2020) due to the additional data for 2014, possible data updates, and/or differences in the bandwidth selection of the long-run covariance matrices. Accordingly, there exists a turning point after which further per capita economic growth reduces per capita carbon dioxide emissions. The numerical values for the turning points are heterogeneous between countries.

The widths of the confidence intervals for $\beta_{i,3}$ and $\beta_{i,4}$ display a similar pattern. From shortest to longest, the ordering is always FM-SUR, FM-SOLS, and FM-GLS and we also see how widths vary substantially between methods. To uncover the origin of these findings we conduct one final simulation study with a parameter specification that closely mimics the properties of the dataset.111111The details of this simulation DGP are provided in Section S3 of the Supplement. A visualisation of the data and the model fit are also provided there. The average empirical coverage probabilities of asymptotic 95% confidence intervals are 78.0%, 66.5% and 89.0% for FM-SOLS, FM-SUR, and FM-GLS, respectively. In other words, the calculated confidence intervals are generally too short. By reverse engineering it turns out that the confidence intervals should be scaled by factors of 1.67, 2.15 and 1.24 to bring them back to the desired nominal level. Overall, the applied researcher should be careful when using the confidence intervals as indications for parameter uncertainty.

6 Conclusion

We proposed a framework to conduct inference on cointegrating polynomial regressions. Parameters are obtained using a Fully Modified GLS estimator and we studied a cointegration test that is based on filtered residuals. Monte Carlo simulations revealed the advantages and disadvantages of these methods. The empirical researcher should realize that all estimation approaches have a tendency to underestimate parameter uncertainty and thus provide confidence intervals that are too small. The FM-GLS estimator suffers the least from this problem. Several interesting questions are left for future research. From a theoretical viewpoint, it is interesting to study the behaviour of the modified Cholesky decomposition (and BIAM) when the series under consideration is nonstationary. This would give insights into the behaviour of: (1) the FM-GLS estimator while estimating spurious regressions, and (2) the power properties of the cointegration tests. From a practical viewpoint, there seems a need to obtain more acurate standard errors of the parameter estimators.

Acknowledgements

This paper has been presented at the 2018 CFE meeting in Pisa, the NESG 2019 conference in Amsterdam, and the $6^{th}$ RCEA Time Series Econometrics Workshop in Larnaca. We would like to thank conference participants, especially Peter Pedroni and Peter Phillips, for useful comments and suggestions. We extend our thanks to Eric Beutner, Dick van Dijk, Richard Paap, Franz Palm, and Stephan Smeekes for their valuable feedback on earlier versions of this manuscript. All remaining errors are our own.

Appendix A Proofs of Main Theorems

Lemma 1

Let $\bm{\mathcal{A}}_{q}(L)=\bm{I}_{n}-\sum_{j=1}^{q}\bm{A}_{j}(q)L^{j}$ denote the lag polynomial implied by the coefficient matrices in (2.3). By the Beveridge-Nelson (BN) decomposition, we also have $\bm{\mathcal{A}}_{q}(L)=\bm{\mathcal{A}}_{q}(1)+(1-L)\widetilde{\bm{\mathcal{A}}}_{q}(L)$ where $\widetilde{\bm{\mathcal{A}}}_{q}(L)=\sum_{j=1}^{q}\widetilde{\bm{A}}_{j}(q)L^{j-1}$ with $\widetilde{\bm{A}}_{j}(q)=\sum_{i=j}^{q}\bm{A}_{i}(q)$ . If Assumption 1 holds, then

(a)

$\bm{\mathcal{A}}_{q}(1)=\bm{\mathcal{A}}(1)+O\left(\sum_{j=q+1}^{\infty}j^{1/2}\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}\right)$ ** 2. (b)

There exists a $q^{*}>0$ such that $\sum_{j=1}^{q}\big{\|}\widetilde{\bm{A}}_{j}(q)\big{\|}_{\mathcal{F}}<\infty$ for all $q>q^{*}$

Proof

(a) By Cauchy-Schwartz, we have $\big{\|}\bm{\mathcal{A}}_{q}(1)-\bm{\mathcal{A}}(1)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}\big{\|}\bm{A}_{j}(q)-\bm{A}_{j}\big{\|}_{\mathcal{F}}+\sum_{j=q+1}^{\infty}\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}\leq\Big{(}q\sum_{j=1}^{q}\big{\|}\bm{A}_{j}(q)-\bm{A}_{j}\big{\|}_{\mathcal{F}}^{2}\Big{)}^{1/2}+\sum_{j=q+1}^{\infty}\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}$ and subsequently use Lemma S1 (see Supplement). (b) Using Baxter’s inequality in Theorem 6.6.12 in Hannan and Deistler (2012) (also see their Remark 3) for the final inequality, we derive $\sum_{j=1}^{q}\big{\|}\widetilde{\bm{A}}_{j}(q)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}j\big{\|}\bm{A}_{j}(q)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}j\big{\|}\bm{A}_{j}(q)-\bm{A}_{j}\big{\|}_{\mathcal{F}}+\sum_{j=1}^{q}j\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}\leq C\sum_{j=1}^{q}j\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}\leq C$ . ∎

Proof of Theorem 1

The premultiplication by $\bm{\mathcal{M}}_{\bm{u}}(q)$ applies a linear filter whereas $\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)$ implies weighting. Since the behaviour of the first $q\ll T$ elements does not affect the asymptotic results, we take $\bm{Z}_{t}=\bm{u}_{t}=\mathbf{O}$ for $t\leq 0$ and for all $t=1,2\dots$ , we apply the transformations implied by $\bm{\mathcal{A}}_{q}(L)$ and $\bm{S}^{-1}(q)$ .121212The same argumentation is used in Phillips and Park (1988). The modification to obtain a rigorous proof is straightforward. Consequently, we have

[TABLE]

Using the BN decomposition, Lemma 1, we first show

[TABLE]

Note that $\left\|\sum_{j=1}^{q}\widetilde{\bm{A}}_{j}(q)\Delta\bm{Z}_{t-j+1}^{\prime}\bm{G}_{T}\right\|\leq\sum_{j=1}^{q}\big{\|}\widetilde{\bm{A}}_{j}(q)\big{\|}\leavevmode\nobreak\ \big{\|}\Delta\bm{Z}_{t-j+1}^{\prime}\bm{G}_{T}\big{\|}$ and that for any $t$ ,

[TABLE]

The vector $\bm{G}_{\bm{d}_{i},T}\Delta\bm{d}_{it}$ typically contains elements $T^{-(k+\frac{1}{2})}\big{[}t^{k}-(t-1)^{k}\big{]}$ where $k=0,1,\cdots,q_{i}$ . By the inequality $(a+b)^{n}\leq a^{n}+nb(a+b)^{n-1}$ , for $a,b\geq 0$ , $n\in\mathbb{N}$ , we obtain $0\leq t^{k}-(t-1)^{k}\leq kt^{k-1}$ , and thus $T^{-(k+\frac{1}{2})}\big{[}t^{k}-(t-1)^{k}\big{]}\leq q_{i}\leavevmode\nobreak\ T^{-3/2}\leq CT^{-3/2}$ . As a result, $\left\|\bm{G}_{\bm{d}_{i},T}\Delta\bm{d}_{it}\right\|^{2}\leq CT^{-3}$ . The vector $\bm{G}_{\bm{s}_{i},T}\Delta\bm{s}_{it}$ typically contains elements $T^{-(k+1)/2}\big{(}x_{it}^{k}-x_{it-1}^{k}\big{)}$ , where $k=1,\dots,p_{i}$ . The binomial expansion implies $x_{it}^{k}-x_{it-1}^{k}=\sum_{m=0}^{k-1}{k\choose m}x_{it-1}^{m}v_{it}^{k-m}=O_{p}\big{(}T^{(k-1)/2}\big{)}$ , and thus $T^{-(k+1)/2}\big{(}x_{it}^{k}-x_{it-1}^{k}\big{)}=O_{p}\big{(}T^{-1}\big{)}$ . It further implies that $\left\|\bm{G}_{\bm{s}_{i},T}\Delta\bm{s}_{it}\right\|^{2}=O_{p}\big{(}T^{-2}\big{)}$ . Combining $\big{\|}\Delta\bm{Z}_{t}^{\prime}\bm{G}_{T}\big{\|}=O_{p}\big{(}T^{-1}\big{)}$ with Lemma 1(b) we establish the second equality in (A.2). Now (A.2) follows from Lemma 1(a) and $\sum_{j=q+1}^{\infty}j^{1/2}\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}=o\big{(}q^{-1/2}\big{)}$ . Using (A.2) and $\big{\|}\bm{S}(q)-\bm{\varSigma}_{\eta\eta}\big{\|}\rightarrow 0$ (see (S1.17) in the supplementary material), we obtain

[TABLE]

We rewrite the second part in (A.1) as

[TABLE]

and we will repeatedly use the identity

[TABLE]

for any matrices $\bm{J}_{t}\in\mathbb{R}^{d\times n}$ , $\bm{D}\in\mathbb{R}^{n\times n}$ and $\bm{e}_{t}=[e_{1t},\dots,e_{nt}]^{\prime}\in\mathbb{R}^{n\times 1}$ . Using this identity, we have

[TABLE]

where $\eta_{it}$ is the $i^{th}$ entry of $\bm{\eta}_{t}$ . For $1\leq i\leq n$ , we subsequently use the BN decomposition to obtain:

[TABLE]

By definition, we have $\sum_{t=1}^{T}\bm{G}_{T}\bm{Z}_{t}\eta_{it}=\operatorname{diag}\left[\sum_{t=1}^{T}\bm{G}_{1,T}\bm{z}_{1t},\dots,\sum_{t=1}^{T}\bm{G}_{n,T}\bm{z}_{nt}\right]\eta_{it}$ , where the limiting distribution of each block follows from Proposition 1 of Wagner and Hong (2016). More specifically, the $k^{th}$ block will converge to a stochastic integral and a second order bias term which is proportional to $\bm{\varSigma}_{\epsilon_{k}\eta_{i}}:=\mathbb{E}\left(\epsilon_{kt}\eta_{it}\right)$ (the $(k,i)^{th}$ element of $\bm{\varSigma}_{\epsilon\eta}$ ), $1\leq k,i\leq n$ , and thus

[TABLE]

where $\bm{B}_{i}:=\operatorname{diag}\left[\bm{\varSigma}_{\epsilon_{1}\eta_{i}}\leavevmode\nobreak\ \bm{b}_{1},\cdots,\bm{\varSigma}_{\epsilon_{n}\eta_{i}}\leavevmode\nobreak\ \bm{b}_{n}\right]$ . ∎

As $\sum_{t=1}^{T}\bm{G}_{T}\Delta\bm{Z}_{t}\eta_{it}\widetilde{\bm{A}}_{1}(q)^{\prime}=\operatorname{diag}\left[\sum_{t=1}^{T}\bm{G}_{1,T}\Delta\bm{z}_{1t},\dots,\sum_{t=1}^{T}\bm{G}_{n,T}\Delta\bm{z}_{nt}\right]\eta_{it}\left(\sum_{j=1}^{\infty}\bm{A}_{j}+o(1)\right)^{\prime}$ , we again consider the limiting distributions block-wise. Every element in the blocks will rely on one of the following three results. (1) As derived below (A.3), we have $\big{|}T^{-(j+\frac{1}{2})}\sum_{t=1}^{T}\big{[}t^{j}-(t-1)^{j}\big{]}\eta_{it}\big{|}\leq CT^{-3/2}\sum_{t=1}^{T}|\eta_{it}|=o_{p}(1)$ . (2) By Assumption 1, for any $1\leq k,i\leq n$ , $v_{kt}$ and $\eta_{it}$ are Near Epoch Dependent in $L_{4}$ -norm on $\left\{[\bm{\eta}_{t}^{\prime},\bm{\varepsilon}_{t}^{\prime}]^{\prime}\right\}_{t\in\mathbb{Z}}$ of size $-1$ and arbitrary size, respectively. A small variation on Theorem 17.9 from Davidson (1994) shows that $\{v_{it}\eta_{it}\}$ are $L_{2}$ -NED of size $-1$ . The i.i.d. assumption on $\left\{[\bm{\eta}_{t}^{\prime},\bm{\varepsilon}_{t}^{\prime}]^{\prime}\right\}$ allows for a LLN for the sequence $\{v_{it}\eta_{it}\}$ , see e.g. Theorem 20.21 of Davidson (1994), implying $T^{-1}\sum_{t=1}^{T}\Delta x_{kt}\eta_{it}=T^{-1}\sum_{t=1}^{T}v_{kt}\eta_{it}\longrightarrow_{p}\bm{\varSigma}_{\epsilon_{k}\eta_{i}}$ . (3) $T^{-(j+1)/2}\sum_{t=1}^{T}\Delta x_{kt}^{j}\eta_{it}\Rightarrow j\bm{\varSigma}_{\epsilon_{k}\eta_{i}}\int_{0}^{1}\bm{B}_{v_{k}}^{j-1}(r)dr$ , where $j\geq 2$ . The specific reason is as follows. By the binomial expansion (below (A.3)), we have

[TABLE]

where $\frac{1}{\sqrt{T}}\sum_{t=1}^{T-1}\left(\frac{x_{kt}}{\sqrt{T}}\right)^{j-1}\left(v_{kt}\eta_{it}-\bm{\varSigma}_{\epsilon_{k}\eta_{i}}\right)=O_{p}(1)$ . To see this, we refer to de Jong (2002). The moment and NED conditions in his Assumption 1 are satisfied. Moreover, since $F(x)=x^{j-1}$ is homogeneous of degree $j-1$ , his Assumption 2 holds as well. The desired result now follows from Theorem 1 in de Jong (2002). Combining these results, we obtain

[TABLE]

Finally, the last term in (A.8) is bounded by $\sum_{j=2}^{q}\big{\|}\bm{A}_{j}(q)\big{\|}\leavevmode\nobreak\ \big{\|}\sum_{t=1}^{T}\bm{G}_{T}\Delta\bm{Z}_{t-j+1}\eta_{it}\big{\|}$ . Using similar arguments above, we conclude that $\sum_{t=1}^{T}\bm{G}_{T}\Delta\bm{Z}_{t-j+1}\eta_{it}=o_{p}(1)$ (lags of $\Delta\bm{Z}_{t}$ will lead to $\mathbb{E}\big{(}v_{kt-j}\eta_{it}\big{)}=0$ for any $j>0$ ). Hence, $\sum_{j=2}^{q}\sum_{t=1}^{T}\bm{G}_{T}\Delta\bm{Z}_{t-j+1}\eta_{it}\widetilde{\bm{A}}_{j}(q)^{\prime}=o_{p}(1)$ . Combining (A.8), (A.9) and (A.10), we have

[TABLE]

Note that $\operatorname{vec}\Big{(}\bm{\varSigma}_{\eta\eta}^{-1}\Big{)}=\left[\begin{smallmatrix}\operatorname{col}_{1}(\bm{\varSigma}_{\eta\eta}^{-1})\\ \vdots\\ \operatorname{col}_{n}(\bm{\varSigma}_{\eta\eta}^{-1})\end{smallmatrix}\right]$ . Inserting (A.11) into (A.7), we eventually have

[TABLE]

where the symmetry property of $\bm{\varSigma}_{\eta\eta}^{-1}$ is used in the final step.

Now we consider the term $\@slowromancap ii@$ in (A.5). If we define $\bm{u}_{t}^{*}=\big{[}u_{1t}^{*},\dots,u_{nt}^{*}\big{]}:=\bm{\mathcal{A}}_{q}(L)\bm{u}_{t}-\bm{\eta}_{t}=-\sum_{j=1}^{\infty}\big{(}\bm{A}_{j}(q)-\bm{A}_{j}\big{)}\bm{u}_{t-j}$ , where $\bm{A}_{j}(q)=\mathbf{O}$ for $j>q$ , and then apply (A.6), we have $\@slowromancap ii@=\Big{[}\sum_{t=1}^{T}\big{(}\bm{\mathcal{A}}_{q}(L)\bm{Z}_{t}^{\prime}\bm{G}_{T}\big{)}^{\prime}u_{1t}^{*},\dots,\sum_{t=1}^{T}\big{(}\bm{\mathcal{A}}_{q}(L)\bm{Z}_{t}^{\prime}\bm{G}_{T}\big{)}^{\prime}u_{nt}^{*}\Big{]}\operatorname{vec}\Big{(}\bm{\varSigma}_{\eta\eta}^{-1}+o(1)\Big{)}$ . For any block $i=1,\dots,n$ , by the BN decomposition (A.2), we have

[TABLE]

It implies $\@slowromancap ii@=o_{p}(1)$ . The theorem now follows from (A.1), (A.4), and (A.12).

Proof of Theorem 2

We start wih the estimation error $\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q)-\bm{\varSigma}_{\bm{u}}^{-1}(q)$ . Repeated addition and subtraction yields

[TABLE]

We will only consider the terms $\big{\|}\widehat{\bm{\mathcal{M}}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}(q)\big{\|}$ and $\big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\big{\|}$ . It is not hard to derive that the remaining terms are bounded in probability. Define $\bm{\mathcal{G}}=\widehat{\bm{\mathcal{M}}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}(q)$ and denote its ( $n\times n$ ) subblocks by $\bm{\mathcal{G}}_{ij}$ , $1\leq i,j\leq T$ . This matrix $\bm{\mathcal{G}}$ is banded in such a way that there are at most $2q-1$ nonzero block in the block-columns of $\bm{\mathcal{G}}\bm{\mathcal{G}}^{\prime}$ . Using this observation and various norm properties, we find

[TABLE]

where the final step follows from Lemma S3.131313More specifically, for any matrix $\bm{Q}$ we have $\|\bm{Q}\|^{2}\leq\|\bm{Q}\bm{Q}^{\prime}\|_{1}$ . Moreover, if $\bm{Q}$ is an $(n\times n)$ matrix, then also $\|\bm{Q}\|_{1}\leq\sqrt{n}\|\bm{Q}\|_{\mathcal{F}}\leq n\|\bm{Q}\|$ . We conclude that $\big{\|}\widehat{\bm{\mathcal{M}}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}(q)\big{\|}=O_{p}\left(\sqrt{q^{3}/T}\right)$ . The difference $\widehat{\bm{\mathcal{S}}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)$ forms a symmetric and block diagonal matrix, hence $\big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}(q)\big{\|}=\max\left\{\big{\|}\widehat{\bm{S}}(0)-\bm{S}(0)\big{\|},\max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{S}}(\ell)-\bm{S}(\ell)\big{\|}\right\}$ . By Assumption 2,

[TABLE]

Applying Lemma S3, we see $\big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\big{\|}\leq\big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}(q)\big{\|}\leavevmode\nobreak\ \big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}^{-1}(q)\big{\|}\leavevmode\nobreak\ \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\big{\|}=O_{p}\left(q/\sqrt{T}\right)$ . Overall, recalling (A.13), a bound on the estimation eror is $\left\|\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q)-\bm{\varSigma}_{\bm{u}}^{-1}(q)\right\|=O_{p}\left(\sqrt{q^{3}/T}\right)$ .

The bound on the truncation error, $\left\|\bm{\varSigma}_{\bm{u}}^{-1}(q)-\bm{\varSigma}_{\bm{u}}^{-1}\right\|\leq C\frac{1}{\sqrt{q}}\sum_{s=q+1}^{\infty}s\left\|\bm{A}_{s}\right\|_{\mathcal{F}}$ , follows from a straightforward generalization of the results in Lemma 2 of Cheng et al. (2015) and Propositions 2.1-2.2 of Ing et al. (2016b). See Lemma S4 in the supplementary material for details. ∎

Proof of Theorem 3

(a)-(b) See the proof of Proposition 1 in Wagner et al. (2020). (c) Since the residuals $\{\widehat{\bm{u}}_{t}\}$ are obtained by first stage OLS, we get $\|\widehat{\bm{u}}-\bm{u}\|^{2}\leq\|\bm{G}_{T}^{-1}\big{(}\widehat{\bm{\beta}}_{OLS}-\bm{\beta}\big{)}\|^{2}\leavevmode\nobreak\ \|\bm{G}_{T}\bm{Z}^{\prime}\bm{Z}\bm{G}_{T}\|=O_{p}(1)$ . Assumption 2 is thus satisfied and we can rely on the results in Theorem 2. From (3.12), the definition of the FM-GLS estimator, we have

[TABLE]

Given Theorem 2, we have $\bm{G}_{T}\bm{Z}^{\prime}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q)\bm{Z}\bm{G}_{T}=\bm{G}_{T}\bm{Z}^{\prime}\bm{\varSigma}_{\bm{u}}^{-1}(q)\bm{Z}\bm{G}_{T}+o_{p}(1)$ and it converges weakly to the expression in (A.4).

To continue, we define $\widehat{\bm{\mathcal{A}}}_{q}(L)=\bm{I}_{n}-\sum_{j=1}^{q}\widehat{\bm{A}}_{j}(q)L^{j}$ and its BN decomposition $\widehat{\bm{\mathcal{A}}}_{q}(L)=\widehat{\bm{\mathcal{A}}}_{q}(1)+(1-L)\bm{\mathcal{A}}_{q}^{*}(L)$ through $\bm{\mathcal{A}}_{q}^{*}(L)=\sum_{j=1}^{q}\bm{A}_{j}^{*}(q)L^{j-1}$ with $\bm{A}_{j}^{*}(q)=\sum_{i=j}^{q}\widehat{\bm{A}}_{i}(q)$ . $\widehat{\bm{\mathcal{A}}}_{q}(1)=\bm{\mathcal{A}}_{q}(1)+o_{p}(1)$ and $\sum_{j=1}^{q}\big{\|}\bm{A}_{j}^{*}(q)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}\big{\|}\widetilde{\bm{A}}_{j}(q)\big{\|}_{\mathcal{F}}+o_{p}(q)$ are obtained from the following two results: (1) $\big{\|}\widehat{\bm{\mathcal{A}}}_{q}(1)-\bm{\mathcal{A}}_{q}(1)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}\big{\|}\widehat{\bm{A}}_{j}(q)-\bm{A}_{j}(q)\big{\|}_{\mathcal{F}}\leq C\sqrt{q}\big{\|}\widehat{\bm{A}}(q)-\bm{A}(q)\big{\|}=O_{p}\Big{(}\frac{q^{3/2}}{T^{1/2}}\Big{)}=o_{p}(1)$ , where the last step follows from Lemma S3, and (2) $\sum_{j=1}^{q}\big{\|}\bm{A}_{j}^{*}(q)-\widetilde{\bm{A}}_{j}(q)\big{\|}_{\mathcal{F}}\leq q\sum_{j=1}^{q}\big{\|}\widehat{\bm{A}}_{j}(q)-\bm{A}_{j}(q)\big{\|}_{\mathcal{F}}=o_{p}(q)$ . Using the BN decomposition of $\widehat{\bm{\mathcal{A}}}_{q}(L)$ and similar steps as those below (A.5), we have

[TABLE]

where $\widehat{\bm{S}}(q)=\bm{S}(q)+o_{p}(1)$ given in Lemma S3. Using the identity (A.6), it is not hard to deduce

[TABLE]

Combining the results above leads to:

[TABLE]

where ${\bm{\mathcal{B}}}^{+}:={\bm{\mathcal{B}}}_{\epsilon\eta}-{\bm{\mathcal{B}}}_{vu}$ . By construction, we have $\bm{G}_{T}\widehat{{\bm{\mathcal{B}}}}^{+}\Rightarrow{\bm{\mathcal{B}}}^{+}$ . Altogether this implies the limiting distribution in the theorem. ∎

Proof of Theorem 4

We first introduce the appropriate scaling into the test statistic, that is

[TABLE]

Since the matrices $\bm{G}_{T}^{-1}$ and $\bm{R}^{\prime}\bm{R}$ commute and $\bm{R}\bm{R}^{\prime}=\bm{I}_{k}$ , we have $(\bm{R}\bm{G}_{T}^{-1}\bm{R}^{\prime})(\bm{R}\widehat{\bm{\beta}}_{FGLS}^{+}-\bm{r})=\bm{R}\bm{G}_{T}^{-1}(\widehat{\bm{\beta}}_{FGLS}^{+}-\bm{\beta})$ under the null hypothesis. Conditional on $\mathcal{F}_{v}=\sigma\big{(}\bm{B}_{v}(r),0\leq r\leq 1\big{)}$ , this quantity is asymptotically normally distributed by Theorem 3(c) with asymptotic covariance matrix

[TABLE]

The consistent estimation of all the quantities involved ensures that $(\bm{R}\bm{G}_{T}^{-1}\bm{R}^{\prime})\leavevmode\nobreak\ \widehat{\bm{\varPhi}}\leavevmode\nobreak\ (\bm{R}\bm{G}_{T}^{-1}\bm{R}^{\prime})$ has the same limit. Therefore, the Wald statistics is conditionally chi-square distributed with $k$ degrees of freedom. Since this distribution does not depend on $\mathcal{F}_{v}$ , we conclude that the unconditional distribution of $\mathcal{W}$ is also $\chi_{k}^{2}$ . ∎

Proof of Theorem 5

The results for $K_{j,b_{T}}^{SOLS}$ and $K_{j,b_{T}}^{SUR}$ follow from a straightforward multivariate extensions of the proof of Proposition 6 in Wagner and Hong (2016). For $K_{j,b_{T}}^{BIAM}$ , we first define the population counterparts of $\bm{\varphi}_{j,b_{T}}(\{\hat{\bm{u}}_{FGLS}\})$ and $\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})$ . That is, let $\bm{\varphi}_{j,b_{T}}:=\big{[}\bm{u}_{j}^{\prime},\sum_{s=j}^{j+1}\bm{u}_{s}^{\prime},\dots,\sum_{s=j}^{j+b_{T}-1}\bm{u}_{s}^{\prime}\big{]}^{\prime}$ and let $\bm{\varSigma}_{\bm{u}}^{-1}(q_{T},b_{T})$ denote the subblock matrix of $\bm{\varSigma}_{\bm{u}}^{-1}(q_{T})$ formed by taking the elements with row and column indices belonging to the set $\{n(T-b_{T})+1,n(T-b_{T})+2,\ldots,nT\}$ . By rearrangement, we have

[TABLE]

where the remainder term is bounded as $|R(q_{T},j,b_{T})|\leq\big{\|}b_{T}^{-1}\big{(}\bm{\varphi}_{j,b_{T}}(\{\hat{\bm{u}}_{FGLS}\})-\bm{\varphi}_{j,b_{T}}\big{)}\big{\|}^{2}\leavevmode\nobreak\ \big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})\big{\|}+2\big{\|}b_{T}^{-1}\big{(}\bm{\varphi}_{j,b_{T}}(\{\hat{\bm{u}}_{FGLS}\})-\bm{\varphi}_{j,b_{T}}\big{)}\big{\|}\leavevmode\nobreak\ \big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})\big{\|}\leavevmode\nobreak\ \big{\|}b_{T}^{-1}\bm{\varphi}_{j,b_{T}}\big{\|}$ . Poincaré’s separation theorem (e.g. page 347-348 of Abadir and Magnus (2005)) implies $\lambda_{min}(\bm{A})\leq\lambda_{min}(\bm{B})\leq\lambda_{max}\left(\bm{B}\right)\leq\lambda_{max}\left(\bm{A}\right)$ when $\bm{B}$ is a principal submatrix of $\bm{A}$ . By this inequality and Theorem 2, we conclude that $\big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})-\bm{\varSigma}_{\bm{u}}^{-1}(q_{T},b_{T})\big{\|}\leq\big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T})-\bm{\varSigma}_{\bm{u}}^{-1}(q_{T})\big{\|}=o_{p}(1)$ and $\big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})\big{\|}=O_{p}(1)$ . Moreover,

[TABLE]

where $\left\|b_{T}^{-1/2}\sum_{s=j}^{t}\left(\widehat{\bm{u}}_{s,FGLS}-\bm{u}_{s}\right)\right\|\leq\left\|b_{T}^{-1/2}\sum_{s=j}^{t}\bm{Z}_{s}^{\prime}\bm{G}_{T}\right\|\leavevmode\nobreak\ \big{\|}\bm{G}_{T}^{-1}\big{(}\widehat{\bm{\beta}}_{FGLS}^{+}-\bm{\beta}\big{)}\big{\|}=o_{p}(1)$ by Theorem 3 and the assumption $b_{T}/T\rightarrow 0$ as $T\rightarrow\infty$ . Standard weak convergence arguments imply $\big{\|}b_{T}^{-1}\bm{\varphi}_{j,b_{T}}\big{\|}=O_{p}(1)$ . Combining these results, we have $b_{T}^{-2}\bm{\varphi}_{j,b_{T}}^{\prime}\Big{(}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})-\bm{\varSigma}_{\bm{u}}^{-1}(q_{T},b_{T})\Big{)}\bm{\varphi}_{j,b_{T}}=o_{p}(1)$ and $R(q_{T},j,b_{T})=o_{p}(1)$ . By (A.14), it remains to consider $b_{T}^{-2}\bm{\varphi}_{j,b_{T}}^{\prime}\leavevmode\nobreak\ \bm{\varSigma}_{\bm{u}}^{-1}(q_{T},b_{T})\leavevmode\nobreak\ \bm{\varphi}_{j,b_{T}}$ . Construct the $nT\times nb_{T}$ selection matrix $\bm{R}_{j,b_{T}}$ such that

[TABLE]

Then, by the MCD (2.6), we have

[TABLE]

As argued in the proof of Theorem 1, by the assumption $\frac{q_{T}}{b_{T}}\rightarrow 0$ as $T\rightarrow\infty$ , we can treat the premultiplication of $\bm{R}_{j,b_{T}}\bm{\varphi}_{j,b_{T}}$ by $\bm{\mathcal{M}}_{\bm{u}}(q_{T})$ as applying the filter $\bm{\mathcal{A}}_{q_{T}}(L)$ block-wise. Under the same condition, $\bm{\mathcal{S}}_{\bm{u}}^{-1}(q_{T})$ implies a scaling $\bm{S}^{-1}(q_{T})$ . By the BN decomposition in Lemma 1, and similarly $\bm{\mathcal{C}}(L)=\bm{\mathcal{C}}(1)+(1-L)\widetilde{\bm{\mathcal{C}}}(L)$ with $\bm{\mathcal{C}}(L)=[\bm{\mathcal{A}}(L)]^{-1}$ ,

[TABLE]

For $t=j+[rb_{T}]-1$ , a FCLT for i.i.d. sequences gives

[TABLE]

The partial sum process $\sum_{s=j}^{t}\bm{\eta}_{s}$ thus dominates the asymptotic distribution:

[TABLE]

An application of the continuous mapping theorem completes the proof. ∎

Supplemental Appendix to:

Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve

Yicong Lin, and Hanno Reuvers

S1 Additional Proofs

Lemma S1

If $\{\bm{u}_{t}\}$ satisfies Assumption 1, then for any $m\geq 1$ , there exists a constant $C>0$ such that

[TABLE]

Proof

In view of page 257 of Hannan and Deistler (2012), the summability condition of Assumption 1 implies that the spectral density matrix is bounded and bounded away from zero. The boundedness condition in Cheng and Pourahmadi (1993) is thus satisfied and (S1.1) follows from their Theorem 2.2. ∎

Lemma S2 (Implications of the First Moment Bound Theorem)

Let Assumption 1 hold, and define

[TABLE]

The following three inequalities are true:

(a)

$\mathbb{E}\Big{\|}\frac{1}{T-q}\sum_{t=q}^{T-1}\bm{u}_{t}(q)\bm{u}_{t}(q)^{\prime}-\mathbb{E}\left(\bm{u}_{t}(q)\bm{u}_{t}(q)^{\prime}\right)\Big{\|}^{2}\leq C\frac{q^{2}}{T-q}$ ; 2. (b)

$\mathbb{E}\left\|\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\big{(}\bm{\eta}_{t+1,\ell}-\bm{\eta}_{t+1}\big{)}\bm{u}_{t}(\ell)^{\prime}\right\|^{r}\leq C\left(\frac{\ell}{T-\ell}\right)^{r/2}\Big{(}\sum_{j=\ell+1}^{\infty}\left\|\bm{A}_{j}\right\|_{\mathcal{F}}^{2}\Big{)}^{r/2}$ , for some $r\geq 2$ and any $1\leq\ell\leq q$ ; 3. (c)

$\mathbb{E}\left\|\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\Big{(}\bm{\eta}_{t+1,\ell}\bm{\eta}_{t+1,\ell}^{\prime}\Big{)}-\mathbb{E}\Big{(}\bm{\eta}_{t+1,\ell}\bm{\eta}_{t+1,\ell}^{\prime}\Big{)}\right\|^{r}\leq C(T-\ell)^{-r/2}$ , for some $r\geq 2$ and any $1\leq\ell\leq q$ .

Proof

(a) Since $\|\cdot\|^{2}\leq\|\cdot\|_{\mathcal{F}}^{2}$ , we obtain

[TABLE]

where $u_{i,t}$ denotes the $i^{th}$ element $\bm{u}_{t}$ . As remarked in the main text, the lag polynomial $\bm{\mathcal{A}}(L)$ is invertible by Assumption 1. Recall $\bm{\mathcal{C}}(L)=[\bm{\mathcal{A}}(L)]^{-1}=\sum_{j=0}^{\infty}\bm{C}_{j}L^{j}$ with $\bm{C}_{0}=\bm{I}_{n}$ , and $\sum_{j=0}^{\infty}j\,\|\bm{C}_{j}\|_{\mathcal{F}}<\infty$ . We observe that $u_{i,t}=\sum_{j=0}^{\infty}\operatorname{row}_{i}\big{(}\bm{C}_{j}\big{)}\bm{\eta}_{t-j}$ . By Proposition 10.2(b) of Hamilton (1994), absolute summability of the coefficient matrices $\{\bm{C}_{j}\}_{j=0}^{\infty}$ implies $\sum_{s=0}^{\infty}|\gamma_{u,k}(s)|<\infty$ where $\gamma_{u,k}(s)=\mathbb{E}(u_{k,t}u_{k,t-s})$ . The conditions for the First Moment Bound Theorem (FMBT) in Findley and Wei (1993) are thus satisfied. Choosing $q(t,s)=1$ if $t=s\geq q$ (the banding parameter) and $q(t,s)=0$ otherwise,

[TABLE]

by the FMBT. This bound holds for general $k$ , $\ell$ , $i$ and $j$ , and (a) thereby follows from (S1.3).

(b) For $1\leq\ell\leq q$ and $r\geq 2$ , we have

[TABLE]

by $\|\cdot\|^{r}\leq(\|\cdot\|_{\mathcal{F}}^{2})^{r/2}$ and the $c_{r}$ -inequality. By assumption, $\bm{\eta}_{t+1}$ is uncorrelated with $\big{[}\bm{u}_{t-\ell+1}^{\prime},\ldots,\bm{u}_{t}^{\prime}\big{]}^{\prime}$ implying that $\mathbb{E}\big{[}\sum_{t=\ell}^{T-1}(\bm{\eta}_{t+1,\ell}-\bm{\eta}_{t+1})\bm{u}_{t-s}^{\prime}\big{]}=\mathbf{O}$ . The FMBT can thus be applied directly without having to express the quadratic form in deviations from the mean. However, some rewriting is needed to obtain expressions in scalar random sequences. To this end, use $A_{j,kl}$ and $A_{j,kl}(\ell)$ to denot the $(k,l)^{th}$ element of $\bm{A}_{j}$ and $\bm{A}_{j}(\ell)$ , respectively. Setting $\bm{A}_{j}(\ell)=\mathbf{O}$ for $j>\ell$ , we have $\bm{\eta}_{t+1,\ell}-\bm{\eta}_{t+1}=\sum_{j=1}^{\infty}\big{[}\bm{A}_{j}-\bm{A}_{j}(\ell)\big{]}\bm{u}_{t+1-j}$ , and hence

[TABLE]

with $u_{t}^{*}=\sum_{j=1}^{\infty}\sum_{l=1}^{n}\big{(}A_{j,kl}-A_{j,kl}(\ell)\big{)}u_{l,t+1-j}$ , where we suppress the dependence on the index $k$ (also below) without confusion. To apply the FMBT, we define the autocovariances $\gamma_{u^{*}}(t-h)=\mathbb{E}\big{(}u_{t}^{*}u_{h}^{*}\big{)}$ , the difference in lag polynomial coefficients $\bm{a}_{l}(\ell)=\left[A_{1,kl}-A_{1,kl}(\ell),A_{2,kl}-A_{2,kl}(\ell),\ldots\right]^{\prime}$ and $\bm{\varSigma}_{u_{l},\infty}=\big{[}\gamma_{u,l}(i-j),1\leq i,j<\infty\big{]}$ . By the Cauchy-Schwartz inequality, the $c_{r}$ -inequality, and boundedness of the maximum eigenvalue of $\bm{\varSigma}_{u_{l},\infty}$ , we obtain

[TABLE]

Applying the FMBT, we have

[TABLE]

using (S1.6) and the absolute summability of $\{\gamma_{u,m}(t)\}$ . Combining (S1.4), (S1.5) and (S1.7) leads to the desired inequality.

(c) The equality $\bm{\eta}_{t+1,\ell}=\Big{(}\bm{I}_{n}-\sum_{j=1}^{\ell}\bm{A}_{j}(\ell)L^{j}\Big{)}\bm{\mathcal{C}}(L)\bm{\eta}_{t+1}$ shows that $\bm{\eta}_{t+1,\ell}$ has a linear process representation in terms of $\bm{\eta}_{t}$ . Theorem 6.6.12 of Hannan and Deistler (2012) implies that $\sup_{1\leq\ell<\infty}\sum_{j=0}^{\ell}\|\bm{A}_{j}(\ell)\|_{\mathcal{F}}<\infty$ . By Propositions 10.2(b) and 10.3 of Hamilton (1994), both the coefficient matrices associated with $\Big{(}\bm{I}_{n}-\sum_{j=1}^{\ell}\bm{A}_{j}(\ell)L^{j}\Big{)}\bm{\mathcal{C}}(L)$ and the autocovariances $\left\{\mathbb{E}\left(\eta_{k,t+1,\ell}\eta_{k,t+1-s,\ell}\right)\right\}_{s=0}^{\infty}$ are absolutely summable, where $\eta_{k,t+1,\ell}$ is the $k^{th}$ entry of $\bm{\eta}_{t+1,\ell}$ . The proof is completed using the $c_{r}$ -inequality and the FMBT, that is, for $r\geq 2$ ,

[TABLE]

∎

Lemma S3

If Assumptions 1-3 hold, then

[TABLE]

Proof

Recall the definition of $\bm{\eta}_{t+1,\ell}$ and $\bm{u}_{t}(\ell)$ in (S1.2). Similarly, define

[TABLE]

We first prove $\max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{A}}(\ell)-\bm{A}(\ell)\big{\|}=O_{p}\left(q/\sqrt{T}\,\right)$ . Since $\left(\widehat{\bm{A}}(\ell)-\bm{A}(\ell)\right)\widehat{\bm{u}}_{t}(\ell)=\widehat{\bm{\eta}}_{t+1,\ell}-\widetilde{\bm{\eta}}_{t+1,\ell}$ and $\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\widetilde{\bm{\eta}}_{t+1,\ell}\widehat{\bm{u}}_{t}(\ell)^{\prime}=\mathbf{O}$ (the first-order condition from (3.4)), we have

[TABLE]

If we can show that $\frac{1}{T-q}\sum_{t=q}^{T-1}\widehat{\bm{u}}_{t}(q)\widehat{\bm{u}}_{t}(q)^{\prime}$ is asymptotically invertible, then $\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\widehat{\bm{u}}_{t}(\ell)\widehat{\bm{u}}_{t}(\ell)^{\prime}$ must also be asymptotically invertible with probability 1, for any $1\leq\ell\leq q$ .141414If the matrix $\bm{Q}$ is invertible, then each leading principle submatrix of $\bm{Q}$ is invertible as well. By the triangular inequality, $\left\|\frac{1}{T-q}\sum_{t=q}^{T-1}\widehat{\bm{u}}_{t}(q)\widehat{\bm{u}}_{t}(q)^{\prime}-\mathbb{E}\big{(}\bm{u}_{t}(q)\bm{u}_{t}(q)^{\prime}\big{)}\right\|\leq\@slowromancap i@_{a}+\@slowromancap i@_{b}$ , where

[TABLE]

by Chebyshev’s inequality and Lemma S2 $(i)$ , and

[TABLE]

since $\left\|\sum_{t}\bm{a}_{t}\bm{a}_{t}^{\prime}-\sum_{t}\bm{b}_{t}\bm{b}_{t}^{\prime}\right\|\leq\sum_{t}\left\|\bm{a}_{t}-\bm{b}_{t}\right\|^{2}+2\sqrt{\sum_{t}\left\|\bm{a}_{t}-\bm{b}_{t}\right\|^{2}}\sqrt{\sum_{t}\left\|\bm{b}_{t}\right\|^{2}}$ . We have $\frac{1}{T-q}\sum_{t=q}^{T-1}\|\widehat{\bm{u}}_{t}(q)-\bm{u}_{t}(q)\|^{2}=\frac{1}{T-q}\sum_{t=q}^{T-1}\sum_{s=t-q+1}^{t}\|\widehat{\bm{u}}_{s}-\bm{u}_{s}\|^{2}\leq\frac{q}{T-q}\|\widehat{\bm{u}}-\bm{u}\|^{2}=\frac{q}{T}O_{p}\left(1\right)$ by Assumption 2. Because $\frac{1}{T-q}\sum_{t=q}^{T-1}\|\bm{u}_{t}(q)\|=O_{p}(q)$ by Markov’s inequality, we conclude $\@slowromancap i@_{b}=O_{p}\left(q/\sqrt{T}\right)$ . Overall, this gives

[TABLE]

Now observe that $\mathbb{E}\big{(}\bm{u}_{t}(q)\bm{u}_{t}(q)^{\prime}\big{)}$ is a leading principal submatrix of $\bm{\varSigma}_{\bm{u}}$ (thus invertible, see footnote 14). As a result, $\frac{1}{T-q}\sum_{t=q}^{T-1}\widehat{\bm{u}}_{t}(q)\widehat{\bm{u}}_{t}(q)^{\prime}$ is asymptotically invertible.

We subsequently bound the RHS of (S1.10) as follows: $\max_{1\leq\ell\leq q}\left\|\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\widehat{\bm{\eta}}_{t+1,\ell}\widehat{\bm{u}}_{t}(\ell)^{\prime}\right\|\leq\@slowromancap ii@_{a}+\ldots+\@slowromancap ii@_{e}$ , where $\@slowromancap ii@_{a}=\max_{1\leq\ell\leq q}\left\|\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\bm{\eta}_{t+1}\bm{u}_{t}(\ell)^{\prime}\right\|$ , $\@slowromancap ii@_{b}=\max_{1\leq\ell\leq q}\left\|\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\left(\bm{\eta}_{t+1,\ell}-\bm{\eta}_{t+1}\right)\bm{u}_{t}(\ell)^{\prime}\right\|$ , and

[TABLE]

We consider these terms separately starting from $\@slowromancap ii@_{a}$ . Using the properties of Frobenius norm,

[TABLE]

Assumption 1 justifies the use of Lemma 2 in Wei (1987) which gives $\mathbb{E}\left|\sum_{t=\ell}^{T-1}\eta_{i,t+1}u_{j,t-s}\right|^{2}\leq C\sum_{t=\ell}^{T-1}\mathbb{E}\left(u_{j,t-s}^{2}\right)\leq C(T-\ell)$ . By Chebyshev’s inequality, $\forall\varepsilon>0$ , there exists $\alpha_{\varepsilon}>0$ such that

[TABLE]

and thus $\@slowromancap ii@_{a}=O_{p}\left(q/\sqrt{T}\,\right)$ . Furthermore, we deduce that $\@slowromancap ii@_{b}=O_{p}\left(q/\sqrt{T}\,\right)$ by Lemma S2 $(ii)$ and Chebyshev’s inequality. For $\@slowromancap ii@_{c}$ , if we write $\widehat{\bm{\eta}}_{t+1,\ell}-\bm{\eta}_{t+1,\ell}=\big{[}\bm{I}_{n},-\bm{A}(\ell)\big{]}\big{[}\widehat{\bm{u}}_{t+1}(\ell+1)-\bm{u}_{t+1}(\ell+1)\big{]}$ , then by Cauchy-Schwarz inequality and Baxter’s inequality (leads to $\max_{1\leq\ell\leq q}\left\|\bm{A}(\ell)\right\|^{2}\leq C$ ),

[TABLE]

where the last step follows from arguments similar to those preceding (S1.11). Similarly, $\@slowromancap ii@_{d}=O_{p}\left(q/\sqrt{T}\,\right)$ and $\@slowromancap ii@_{e}=O_{p}\left(q/T\,\right)$ . Combining all results, we finally have

[TABLE]

By invertibility of $\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\widehat{\bm{u}}_{t}(\ell)\widehat{\bm{u}}_{t}(\ell)^{\prime}$ , (S1.10) and (S1.12), $\max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{A}}(\ell)-\bm{A}(\ell)\big{\|}=O_{p}\left(q/\sqrt{T}\,\right)$ follows.

We continue with $\max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{S}}(\ell)-\bm{S}(\ell)\big{\|}=O_{p}\left(q/\sqrt{T}\,\right)$ . Since $\widetilde{\bm{\eta}}_{t+1,\ell}=\widehat{\bm{\eta}}_{t+1,\ell}-\left(\widehat{\bm{A}}(\ell)-\bm{A}(\ell)\right)\widehat{\bm{u}}_{t}(\ell)$ (see (S1.9)), we can use (S1.10) and the invertibility of $\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\widehat{\bm{u}}_{t}(\ell)\widehat{\bm{u}}_{t}(\ell)^{\prime}$ to write

[TABLE]

where the last step in (S1.13) follows from $\max_{1\leq\ell\leq q}\left\|\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\left(\bm{\eta}_{t+1,\ell}\bm{\eta}_{t+1,\ell}^{\prime}\right)-\mathbb{E}\left(\bm{\eta}_{t+1,\ell}\bm{\eta}_{t+1,\ell}^{\prime}\right)\right\|=O_{p}\left(\sqrt{q/T}\right)$ using Lemma S2 $(iii)$ and (S1.12). By the inequality $\left\|\sum_{t}\bm{a}_{t}\bm{a}_{t}^{\prime}-\sum_{t}\bm{b}_{t}\bm{b}_{t}^{\prime}\right\|\leq\sum_{t}\left\|\bm{a}_{t}-\bm{b}_{t}\right\|^{2}+2\sqrt{\sum_{t}\left\|\bm{a}_{t}-\bm{b}_{t}\right\|^{2}}\sqrt{\sum_{t}\left\|\bm{b}_{t}\right\|^{2}}$ and similar arguments as for $\@slowromancap ii@_{c}$ and $\@slowromancap ii@_{d}$ above, the first term in (S1.13) is bounded by

[TABLE]

Overall, we obtain $\max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{S}}(\ell)-\bm{S}(\ell)\big{\|}=O_{p}\left(q/\sqrt{T}\,\right)$ as well. ∎

Lemma S4

Under Assumptions 1 and 3, we have

[TABLE]

Proof

Consider $\left\|\bm{\varSigma}_{\bm{u}}^{-1}(q)-\bm{\varSigma}_{\bm{u}}^{-1}\right\|$ . A rewriting as in (A.13) shows that $\big{\|}\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\big{\|}$ and $\big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}\big{\|}$ are the two important terms to bound. Hölder’s inequality implies

[TABLE]

For the matrix $1$ -norm we are concerned with the maximum absolute column sum. For an arbitrary $(nT\times nT)$ matrix $\bm{Q}$ partitioned (block) column-wise, i.e. $\bm{Q}=[\bm{Q}_{1},\bm{Q}_{2},\ldots,\bm{Q}_{T}]$ , we have the bound $\|\bm{Q}\|_{1}=\max_{1\leq t\leq T}\|\bm{Q}_{t}\|_{1}\leq\sqrt{n}\max_{1\leq t\leq T}\|\bm{Q}_{t}\|_{\mathcal{F}}$ . This implies $\big{\|}\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\big{\|}_{1}\leq\sqrt{n}\max\left\{\@slowromancap i@_{a},\@slowromancap i@_{b}\right\}$ where

[TABLE]

We will bound the three summations that are encountered in the expressions for $\@slowromancap i@_{a}$ and $\@slowromancap i@_{b}$ . First, changing the summation index and using $c_{r}$ -inequality,

[TABLE]

For convenience, we define

[TABLE]

For any $j\geq 0$ and $s\geq q+1$ , we have $s^{2}\sum_{k=1}^{s+j}\left\|\bm{A}_{k}(s+j)-\bm{A}_{k}\right\|_{\mathcal{F}}^{2}\leq Cs^{2}\sum_{k=s+j+1}^{\infty}\left\|\bm{A}_{k}\right\|_{\mathcal{F}}^{2}\leq C\left(\sum_{k=s+j+1}^{\infty}k\left\|\bm{A}_{k}\right\|_{\mathcal{F}}\right)^{2}$ by the $L^{2}$ -Baxter’s inequality. The first term in the RHS above is thus bounded by $C\mathcal{K}_{q}$ . Moreover, by Cauchy-Schwartz inequality, the second term can be bounded as $\sum_{s=q+1}^{\infty}\left\|\bm{A}_{s}\right\|_{\mathcal{F}}^{2}=\sum_{s=q+1}^{\infty}\big{(}s^{2}\left\|\bm{A}_{s}\right\|_{\mathcal{F}}^{2}\big{)}s^{-2}\leq\left(\sum_{s=q+1}^{\infty}s^{2}\left\|\bm{A}_{s}\right\|_{\mathcal{F}}^{2}\right)\left(\sum_{s=q+1}^{\infty}s^{-2}\right)\leq\mathcal{K}_{q}$ . Now the second summation in $\@slowromancap i@_{a}$ . We first consider the case $0\leq j\leq q$ , or $\max(1,q+1-j)=q+1-j$ , such that

[TABLE]

using arguments detailed before. This upper bound remains valid for $q+1\leq j\leq T-q-2$ . It is likewise straightforward to derive $\sum_{i=1}^{T-1-j}\left\|\bm{A}_{i}(i+j)-\bm{A}_{i}(q)\right\|_{\mathcal{F}}^{2}\leq C\mathcal{K}_{q}$ . Collecting all the results, we have $\@slowromancap i@_{a}\leq C\sqrt{\mathcal{K}_{q}}$ , $\@slowromancap i@_{b}\leq C\sqrt{\mathcal{K}_{q}}$ , and thus $\big{\|}\bm{F}_{\bm{u}}(q)-\bm{F}_{\bm{u}}\big{\|}_{1}\leq C\sqrt{\mathcal{K}_{q}}$ .

For $\|\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\|_{\infty}$ , we are bounding the maximum absolute row sums. For an arbitrary $(nT\times nT)$ matrix $\bm{Q}$ partitioned as $\bm{Q}=[\bm{Q}_{1}^{\prime},\bm{Q}_{2}^{\prime},\ldots,\bm{Q}_{T}^{\prime}]^{\prime}$ , we have $\|\bm{Q}\|_{\infty}=\max_{1\leq t\leq T}\|\bm{Q}_{t}\|_{\infty}\leq\sqrt{n}\max_{1\leq t\leq T}\|\bm{Q}_{t}\|_{\mathcal{F}}$ , such that

[TABLE]

where $\sum_{j=q+1}^{m}\left\|\bm{A}_{j}(m)\right\|_{\mathcal{F}}^{2}\leq C\mathcal{K}_{q}$ and $\sum_{j=1}^{q}\left\|\bm{A}_{j}(q)-\bm{A}_{j}(m)\right\|_{\mathcal{F}}^{2}\leq C\mathcal{K}_{q}$ , for any $q+1\leq m\leq T-1$ , using the $L^{2}$ -Baxter’s inequality and the previous upper bound on $\sum_{s=q+1}^{\infty}\left\|\bm{A}_{s}\right\|_{\mathcal{F}}^{2}$ . We conclude that $\|\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\|_{\infty}\leq C\sqrt{\mathcal{K}_{q}}$ . Together with our previous we result, we obtain $\big{\|}\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\big{\|}\leq C\sqrt{\mathcal{K}_{q}}$ from (S1.15).

From $\big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}\big{\|}\leq\big{\|}\bm{\mathcal{S}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}\big{\|}\leavevmode\nobreak\ \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}\big{\|}\leavevmode\nobreak\ \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\big{\|}$ we see that it suffices to inspect $\big{\|}\bm{\mathcal{S}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}\big{\|}$ (the other norms are bounded). Exploiting the fact that both $\bm{\mathcal{S}}_{\bm{u}}(q)$ and $\bm{\mathcal{S}}_{\bm{u}}$ are block-diagonal, we have $\big{\|}\bm{\mathcal{S}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}\big{\|}=\max_{q+1\leq k\leq T-1}\left\|\bm{S}(q)-\bm{S}(k)\right\|\leq 2\max_{q\leq k\leq T-1}\left\|\bm{S}(k)-\bm{\varSigma}_{\eta\eta}\right\|$ . Let $\bm{A}_{j}(\ell)=\mathbf{O}$ for $j>\ell$ , and recall the definition of $\bm{\eta}_{t+1,\ell}$ in (S1.2). We find, for any $k\geq q$ ,

[TABLE]

We thereby obtain $\big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}\big{\|}\leq C\mathcal{K}_{q}$ . Together with the bound on $\big{\|}\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\big{\|}$ , we deduce

[TABLE]

The proof is complete. ∎

Theorem S1

Let $\bm{W}(r)=[W_{1}(r),W_{2}(r),\ldots,W_{n}(r)]^{\prime}$ denote an $n$ -dimensional standard Brownian motion. The cumulative density function (CDF) of $\int_{0}^{1}\left\|\bm{W}(r)\right\|^{2}dr$ is given by

[TABLE]

where $k_{j,n}=(-1)^{j}\,\frac{\Gamma(j+n/2)}{j!\Gamma(n/2)}$ , $l_{j,n}=2\sqrt{2}j+\frac{n}{\sqrt{2}}$ and $\mathrm{Erfc}(x)=\frac{2}{\sqrt{\pi}}\int_{x}^{\infty}e^{-t^{2}}dt$ .

Proof

We follow the approach from Example 1 of Anderson and Darling (1952) or equivalently appendix B of Choi and Saikkonen (2010). Let $f_{n}$ denote the probability density function of $\int_{0}^{1}\left\|\bm{W}(r)\right\|^{2}dr$ and write $\mathcal{L}\{\cdot\}$ and $\mathcal{L}^{-1}\{\cdot\}$ for the Laplace and inverse Laplace operator, respectively. From the equality $\int_{0}^{1}\left\|\bm{W}(r)\right\|^{2}dr=\sum_{i=1}^{n}\int_{0}^{1}W_{i}(r)^{2}dr$ , independence of the components of $\bm{W}(r)$ , and the known univariate result in Choi and Saikkonen (2010), we have

[TABLE]

According to equation (4.28) in Anderson and Darling (1952), the CDF is

[TABLE]

where we use (1) a $t$ with a positive real part, (2) linearity of the inverse Laplace operator, and (3) the binomial expansion of $[1+e^{-2\sqrt{2t}}]^{-n/2}$ . The identity from Choi and Saikkonen (2010), $\mathcal{L}^{-1}\left\{\frac{1}{t}e^{-u\sqrt{t}}\right\}(x)=1-\mathrm{Erf}\left(\frac{u}{2\sqrt{x}}\right)=\mathrm{Erfc}\left(\frac{u}{2\sqrt{x}}\right)$ , completes the proof. ∎

S2 Estimation of Quantities for Fully Modified Inference

The FM-GLS estimator relies on $\bm{\varOmega}$ , $\bm{\varDelta}$ , and $\mathbb{E}(\bm{\zeta}_{t}\bm{\zeta}_{t}^{\prime})$ (see Assumption 1). For convenience, we denote this $(2n\times 2n)$ matrix $\left[\begin{smallmatrix}\bm{\varSigma}_{\eta\eta}&\bm{\varSigma}_{\eta\epsilon}\\ \bm{\varSigma}_{\epsilon\eta}&\bm{\varSigma}_{\epsilon\epsilon}\end{smallmatrix}\right]$ by $\bm{\varSigma}$ . Please note the difference between $\bm{\varSigma}$ and the large-dimensional matrix $\bm{\varSigma}_{\bm{u}}$ . In this section, we consider the estimation of these three quantities within the BIAM framework. For conenience, we recall $\bm{\xi}_{t}=[\bm{u}_{t}^{\prime},\bm{v}_{t}^{\prime}]^{\prime}$ and define $\bm{\xi}=[\bm{\xi}_{1}^{\prime},\bm{\xi}_{2}^{\prime},\ldots,\bm{\xi}_{T}^{\prime}]^{\prime}$ . Similarly to the definition of $\bm{\varSigma}_{u}$ , we used $\bm{\varSigma}_{\bm{\xi}}:=\mathbb{E}\left(\bm{\xi}\bm{\xi}^{\prime}\right)$ to denote the $(2nT\times 2nT)$ autocovariance matrix of $\{\bm{\xi}_{t}\}$ . As a sample counterpart, we stack $\widehat{\bm{u}}_{t}$ and $\Delta\bm{x}_{t}=\bm{v}_{t}$ in the $2n$ -dimensional vector $\widehat{\bm{\xi}}_{t}=\big{[}\widehat{\bm{u}}_{t}^{\prime},\bm{v}_{t}^{\prime}\big{]}^{\prime}$ . Using $\{\widehat{\bm{\xi}}_{t}\}_{t=1}^{T}$ , the BIAM estimator for $\bm{\varSigma}_{\bm{\xi}}$ is now constructed as $\widehat{\bm{\varSigma}_{\bm{\xi}}}(q)=\widehat{\bm{\mathcal{M}}}_{\bm{\xi}}^{-1}(q)\widehat{\bm{\mathcal{S}}}_{\bm{\xi}}(q)\widehat{\bm{\mathcal{M}}}_{\bm{\xi}}^{-1\prime}(q)$ , where the matrices $\widehat{\bm{\mathcal{M}}}_{\bm{\xi}}(q)$ and $\widehat{\bm{\mathcal{S}}}_{\bm{\xi}}(q)$ are defined similarly to $\widehat{\bm{\mathcal{M}}}_{\bm{u}}(q)$ and $\widehat{\bm{\mathcal{S}}}_{\bm{u}}(q)$ , respectively. Since the BIAM estimator is fitting VAR processes up to order $q_{T}$ (see (3.4)), the coefficient estimates $\widehat{\bm{F}}_{j}(q_{T})$ of the $j^{th}$ lag when a VAR $(q_{T})$ is fitted, $j=1,2,\dots,q_{T}$ , are immediate byproducts of the BIAM procedure and can thus be used to construct our estimators. Finally, if $\bm{\mathcal{F}}(L)=\operatorname{diag}\left[\bm{\mathcal{A}}(L),\bm{\mathcal{D}}(L)\right]:=\bm{I}_{2n}-\sum_{j=1}^{\infty}\bm{F}_{j}L^{j}$ , where $\bm{F}_{j}=\operatorname{diag}\big{[}\bm{A}_{j},\bm{D}_{j}\big{]}$ , then $\bm{\mathcal{F}}(L)\bm{\xi}_{t}=\bm{\zeta}_{t}$ holds.

Theorem S2

Recall the definitions $\widehat{\bm{\varOmega}}_{q_{T}}=\left(\bm{I}_{2n}-\sum_{j=1}^{q_{T}}\widehat{\bm{F}}_{j}(q_{T})\right)^{-1}\widehat{\bm{\varSigma}}_{q_{T}}\left(\bm{I}_{2n}-\sum_{j=1}^{q_{T}}\widehat{\bm{F}}_{j}(q_{T})\right)^{-1\prime}$ , $\widehat{\bm{\varDelta}}_{q_{T},r_{T}}=\bm{Q}_{r_{T}}^{\prime}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\bm{Q}_{1}$ , and

[TABLE]

where $\bm{Q}_{r}=\left[\mathbf{O}_{2n\times 2n},\cdots,\mathbf{O}_{2n\times 2n},\bm{I}_{2n},\cdots,\bm{I}_{2n}\right]^{\prime}$ is an $(2nT\times 2n)$ block matrices of zeros of which the last $r$ blocks have been replaced by identity matrices. If Assumptions 1-3 and 5 hold, then

[TABLE]

Proof

Note that $\widehat{\bm{\xi}}_{t}-\bm{\xi}_{t}=[(\widehat{\bm{u}}_{t}-\bm{u}_{t})^{\prime},\bm{0}^{\prime}]^{\prime}$ and hence $\|\widehat{\bm{\xi}}-\bm{\xi}\|^{2}=\|\widehat{\bm{u}}-\bm{u}\|^{2}=O_{p}(1)$ by Assumption 2. The conditions for Lemmas S3 – S4 and Theorem 2 are thus satisfied and we can use these results in subsequent proofs. (a) The result (S2.1) follows from the triangle inequality, Lemma S3 and (S1.17). (b) The second result (S2.2) is obtained by the definition $\bm{\varOmega}=(\bm{I}_{2n}-\sum_{j=1}^{\infty}\bm{F}_{j})^{-1}\bm{\varSigma}(\bm{I}_{2n}-\sum_{j=1}^{\infty}\bm{F}_{j})^{-1\prime}$ , Lemma S3 and a straightforward modification of (A.13). (c) By $\bm{\varDelta}=\sum_{h=r_{T}}^{\infty}\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}+\bm{Q}_{r_{T}}^{\prime}\bm{\varSigma}_{\bm{\xi}}\bm{Q}_{1}$ , the LHS of (S2.3) can be bounded

[TABLE]

Since summability conditions on the coefficient matrices carry over to the autocovariances, we have $\sum_{h=r_{T}}^{\infty}\left\|\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}\right\|_{\mathcal{F}}\leq r_{T}^{-1}\sum_{h=r_{T}}^{\infty}h\left\|\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}\right\|_{\mathcal{F}}=o(r_{T}^{-1})$ by Assumption 1. Moreover, $\left\|\bm{Q}_{r_{T}}^{\prime}\bm{\varSigma}_{\bm{\xi}}\right\|\leq C\sqrt{r_{T}}$ and $\big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}^{-1}}(q_{T})-\bm{\varSigma}_{\bm{\xi}}^{-1}\big{\|}$ is discussed in Theorem 2. Finally, showing $\big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\bm{Q}_{1}\big{\|}=O_{p}(1)$ will complete the proof after a straightforward comparison of the established stochastic orders. It suffices to prove $\big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\big{\|}=O_{p}(1)$ . Weyl’s inequality (e.g. pages 40 and 46 in Tao (2012)) and Theorem 2 imply

[TABLE]

By the uniform boundedness of $\big{\|}\bm{\varSigma}_{\bm{\xi}}\big{\|}$ , for a sufficiently large $T$ , there exists a constant $C>0$ such that $\big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\big{\|}^{-1}=\lambda_{min}\Big{(}\widehat{\bm{\varSigma}_{\bm{\xi}}^{-1}}(q_{T})\Big{)}\leq C$ and thus $\big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\big{\|}\leq C^{-1}$ with arbitrarily high probability. ∎

S3 Additional information for the Empirical Application

S3.1 Model fit

S3.2 Simulation DGP

The following procedure was used to obtain a simulation DGP that closely mimics the data characteristics.

(a)

Fit VAR( $p$ ) models ( $1\leq p\leq 8$ ) to the series $\{\hat{\bm{u}}_{t,FGLS}\}$ and $\{\Delta\bm{x}_{t}\}$ individually. The BIC criterion select the VAR(1) specification for both series (Table 7). Store the coefficient matrices $\widehat{\bm{A}}_{u}$ and $\widehat{\bm{A}}_{v}$ as well as the residual series $\{\hat{\bm{\eta}}_{t}\}$ and $\{\hat{\bm{\varepsilon}}_{t}\}$ , respectively. 2. (b)

Stack $\hat{\bm{\zeta}}_{t}=[\hat{\bm{\eta}}_{t}^{\prime},\hat{\bm{\varepsilon}}_{t}^{\prime}]^{\prime}$ and compute $\widehat{\bm{\varSigma}}=\frac{1}{T}\sum_{t=2}^{T}\hat{\bm{\zeta}}_{t}\hat{\bm{\zeta}}_{t}^{\prime}$ . 3. (c)

Denoting the estimated coefficients from the data by $\widehat{\bm{\beta}}_{FGLS}^{+}$ , we generate the new data according to the following equations:

[TABLE]

S4 Details on Implementation

The implementation of the BIAM estimator and the subsampling KPSS tests requires selecting the banding parameter $q$ and the block length $b$ . In our simulations and empirical application, we follow the subsampling and risk-minimization approach previously used by Bickel and Levina (2008), Wu and Pourahmadi (2009) and Ing et al. (2016b) to select $q$ . The steps are as follows:

Step 1

Split the series of (first-step OLS) residuals, $\{\widehat{\bm{u}}_{t}\}_{t=1}^{T}$ , into $J_{0}$ non-overlapping subsequences of length $l_{0}$ . These subsequences are $\{\widehat{\bm{u}}_{t}\}_{t=(j-1)l_{0}+1}^{jl_{0}}$ for $j=1,\dots,J_{0}$ with $J_{0}=[T/l_{0}]$ . 2. Step 2

Select an integer $H$ , $1\leq H<l_{0}$ , and construct the $(nH\times nH)$ sample autocovariance matrix $\widehat{\bm{\varPi}}_{\bm{u},nH}=\frac{1}{T-H}\sum_{t=H}^{T-1}\widehat{\bm{u}}_{t}(H)\widehat{\bm{u}}_{t}(H)^{\prime}$ which is an estimator of $\bm{\varSigma}_{\bm{u},nH}:=\mathbb{E}\left[\bm{u}(H)\bm{u}(H)^{\prime}\right]$ with $\bm{u}(H):=\left[\begin{smallmatrix}\bm{u}_{1}\\ \vdots\\ \bm{u}_{H}\end{smallmatrix}\right]$ , where $\widehat{\bm{u}}_{t}(\ell)=\big{(}\widehat{\bm{u}}_{t}^{\prime},\cdots,\widehat{\bm{u}}_{t-\ell+1}^{\prime}\big{)}^{\prime}$ . Compute $\widehat{\bm{\varPi}}_{\bm{u},nH}^{-1}$ . 3. Step 3

For every subsequence of residuals $1\leq j\leq J_{0}$ , compute the BIAM estimate of $\bm{\varSigma}_{\bm{u},nH}$ repeatedly for all possible banding parameters $1\leq\bar{q}<H$ , denoted as $\widehat{\bm{\varSigma}_{\bm{u},nH}^{-1}}(\bar{q};j)$ . 4. Step 4

Select the banding parameter that minimizes the feasible average risk, i.e.

[TABLE]

We take $p=1$ , $H=[2\times T^{1/4}]$ and $l_{0}=[T/5]$ and obtain satisfactory results for all the settings we have explored. As mentioned in Bickel and Levina (2008), the use of another vector norm (e.g. $p=2$ ) does not lead to qualitatively different results.

When we implement the minimum volatility rule as mentioned in Section 3 to select $b$ , the values of tuning parameters are adopted from Wagner and Hong (2016), see their online supplementary material.

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abadir and Magnus (2005) Abadir, K. M. and J. R. Magnus (2005). Matrix Algebra . Cambridge University Press.
2Anderson and Darling (1952) Anderson, T. W. and D. A. Darling (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. The Annals of Mathematical Statistics 23 , 193–212.
3Andrews (1991) Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59 , 817–858.
4Berk (1974) Berk, K. N. (1974). Consistent autoregressive spectral estimates. The Annals of Statistics 2 , 489–502.
5Beutner et al. (2019) Beutner, E., Y. Lin, and S. Smeekes (2019). GLS estimation and confidence sets for the date of a single break in models with trends. Working Paper.
6Bickel and Levina (2008) Bickel, P. J. and E. Levina (2008). Regularized estimation of large covariance matrices. The Annals of Statistics 36 , 199–227.
7Boden et al. (2017) Boden, T., G. Marland, and R. Andres (2017). Global, regional, and national fossil-fuel CO 2 subscript CO 2 \text{CO}_{2} emissions. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tenn., U.S.A. http://cdiac.ess-dive.lbl.gov/trends/emis/tre_coun.html .
8Bolt et al. (2018) Bolt, J., R. Inklaar, H. de Jong, and J. L. van Zanden (2018). Rebasing ”maddison”: New income comparisons and the shape of long-run economic development. https://www.rug.nl/ggdc/historicaldevelopment/maddison/releases/maddison-project-database-2018 .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve

Abstract

1 Introduction

2 The Model

Example 1

Example 2

3 Asymptotic Theory

Assumption 1 (Innovation Processes)

3.1 Infeasible GLS

Theorem 1 (Limiting Distribution of the infeasible GLS Estimator)

3.2 Consistent Estimation of Σu−1(q)\bm{\varSigma}_{\bm{u}}^{-1}(q)Σu−1​(q) and Feasible GLS

Assumption 2

Assumption 3

Theorem 2 (Consistent Estimation of Σu−1\bm{\varSigma}_{\bm{u}}^{-1}Σu−1​)

3.3 Fully Modified Inference

Assumption 4 (Consistent Estimation of Long-run Covariance Matrices)

Assumption 5

Theorem 3

Theorem 4

3.4 Testing the Null of Cointegration

Theorem 5

Remark 1

Remark 2

4 Simulations

4.1 Monte Carlo Designs

4.2 Discussion of the Simulation Results

5 Empirical Application

6 Conclusion

Acknowledgements

Appendix A Proofs of Main Theorems

Lemma 1

S1 Additional Proofs

Lemma S1

Lemma S2 (Implications of the First Moment Bound Theorem)

Lemma S3

Lemma S4

Theorem S1

S2 Estimation of Quantities for Fully Modified Inference

Theorem S2

S3 Additional information for the Empirical Application

S3.1 Model fit

S3.2 Simulation DGP

S4 Details on Implementation

3.2 Consistent Estimation of $\bm{\varSigma}_{\bm{u}}^{-1}(q)$ and Feasible GLS

Theorem 2 (Consistent Estimation of $\bm{\varSigma}_{\bm{u}}^{-1}$ )