Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve
Yicong Lin, Hanno Reuvers

TL;DR
This paper introduces a Fully Modified GLS estimator for multivariate cointegrating regressions, improving inference and bias correction, with applications to environmental economics and tests for cointegration.
Contribution
It develops the asymptotic theory for a new Fully Modified GLS estimator that handles complex cointegrating relations with deterministic and stochastic trends.
Findings
The FM-GLS estimator shows good performance in simulations.
The tests for cointegration are effective and reliable.
Application to EKC hypothesis provides new insights.
Abstract
This paper develops the asymptotic theory of a Fully Modified Generalized Least Squares estimator for multivariate cointegrating polynomial regressions. Such regressions allow for deterministic trends, stochastic trends and integer powers of stochastic trends to enter the cointegrating relations. Our fully modified estimator incorporates: (1) the direct estimation of the inverse autocovariance matrix of the multidimensional errors, and (2) second order bias corrections. The resulting estimator has the intuitive interpretation of applying a weighted least squares objective function to filtered data series. Moreover, the required second order bias corrections are convenient byproducts of our approach and lead to standard asymptotic inference. We also study several multivariate KPSS-type of tests for the null of cointegration. A comprehensive simulation study shows good performance of the…
| SOLS | SUR | FGLS | infSOLS | infSUR | infGLS | SOLS | SUR | FGLS | infSOLS | infSUR | infGLS | |
| 0 | 0.999 | 1.048 | 3.56E-05 | 0.937 | 0.937 | 0.937 | 0.988 | 1.047 | 3.81E-05 | 0.894 | 0.894 | 0.894 |
| 0.3 | 1.170 | 1.077 | 5.50E-05 | 1.098 | 0.959 | 0.907 | 1.186 | 1.064 | 5.43E-05 | 1.090 | 0.899 | 0.851 |
| 0.6 | 2.166 | 1.474 | 7.09E-05 | 2.017 | 1.180 | 0.864 | 2.260 | 1.426 | 6.85E-05 | 2.025 | 1.046 | 0.785 |
| 0.8 | 5.247 | 2.607 | 7.51E-05 | 6.547 | 2.474 | 0.878 | 5.720 | 2.589 | 7.40E-05 | 6.591 | 1.984 | 0.676 |
| 0 | 1.012 | 1.042 | 4.09E-06 | 0.961 | 0.961 | 0.961 | 1.019 | 1.069 | 4.25E-06 | 0.939 | 0.939 | 0.939 |
| 0.3 | 1.146 | 1.043 | 6.64E-06 | 1.101 | 0.960 | 0.942 | 1.216 | 1.069 | 6.62E-06 | 1.124 | 0.937 | 0.907 |
| 0.6 | 2.045 | 1.295 | 1.01E-05 | 1.920 | 1.101 | 0.906 | 2.222 | 1.345 | 9.37E-06 | 2.032 | 1.056 | 0.851 |
| 0.8 | 5.206 | 2.434 | 1.29E-05 | 5.502 | 1.971 | 0.916 | 6.197 | 2.701 | 1.11E-05 | 6.106 | 1.778 | 0.813 |
| 0 | 1.013 | 1.028 | 2.41E-07 | 0.988 | 0.988 | 0.988 | 1.016 | 1.039 | 2.55E-07 | 0.973 | 0.973 | 0.973 |
| 0.3 | 1.195 | 1.054 | 4.03E-07 | 1.162 | 0.998 | 0.975 | 1.224 | 1.052 | 4.02E-07 | 1.176 | 0.978 | 0.961 |
| 0.6 | 1.877 | 1.154 | 7.45E-07 | 1.789 | 1.054 | 0.952 | 2.142 | 1.217 | 6.49E-07 | 1.993 | 1.023 | 0.914 |
| 0.8 | 4.158 | 1.771 | 1.26E-06 | 4.011 | 1.456 | 0.935 | 5.341 | 2.031 | 1.03E-06 | 5.038 | 1.394 | 0.880 |
| SOLS | SUR | FGLS | infSOLS | infSUR | infGLS | SOLS | SUR | FGLS | infSOLS | infSUR | infGLS | ||
| Panel A: Low endogeneity | |||||||||||||
| 100 | 1.290 | 1.270 | 9.46E-05 | 1.270 | 1.131 | 0.865 | 1.256 | 1.275 | 9.51E-05 | 1.215 | 1.052 | 0.822 | |
| 200 | 1.216 | 1.163 | 1.27E-05 | 1.191 | 1.077 | 0.916 | 1.220 | 1.164 | 1.22E-05 | 1.201 | 1.038 | 0.896 | |
| 500 | 1.137 | 1.088 | 8.58E-07 | 1.133 | 1.033 | 0.961 | 1.159 | 1.106 | 7.65E-07 | 1.151 | 0.997 | 0.936 | |
| 100 | 1.790 | 1.825 | 3.99E-05 | 1.792 | 1.522 | 0.638 | 1.812 | 1.896 | 3.81E-05 | 1.802 | 1.431 | 0.594 | |
| 200 | 1.677 | 1.681 | 4.91E-06 | 1.597 | 1.408 | 0.763 | 1.682 | 1.645 | 4.73E-06 | 1.581 | 1.271 | 0.733 | |
| 500 | 1.467 | 1.404 | 3.34E-07 | 1.392 | 1.248 | 0.897 | 1.522 | 1.415 | 3.01E-07 | 1.432 | 1.185 | 0.864 | |
| 100 | 0.123 | 0.136 | 1.48E-04 | 0.212 | 0.145 | 0.022 | 0.001 | 0.001 | 2.67E-02 | 0.002 | 0.001 | 0.000 | |
| 200 | 2.257 | 2.322 | 8.59E-07 | 2.537 | 1.986 | 0.517 | 1.622 | 1.577 | 1.04E-06 | 2.218 | 1.343 | 0.366 | |
| 500 | 2.098 | 2.050 | 5.47E-08 | 2.021 | 1.686 | 0.703 | 2.031 | 2.014 | 4.90E-08 | 1.964 | 1.394 | 0.632 | |
| Panel B: High endogeneity | |||||||||||||
| 100 | 1.327 | 1.269 | 7.48E-05 | 1.579 | 1.230 | 0.899 | 1.335 | 1.299 | 7.26E-05 | 1.825 | 1.211 | 0.817 | |
| 200 | 1.232 | 1.161 | 9.91E-06 | 1.394 | 1.131 | 0.949 | 1.269 | 1.225 | 9.26E-06 | 1.653 | 1.166 | 0.886 | |
| 500 | 1.179 | 1.091 | 6.66E-07 | 1.297 | 1.079 | 0.980 | 1.162 | 1.093 | 5.99E-07 | 1.464 | 1.071 | 0.942 | |
| 100 | 1.880 | 1.892 | 3.08E-05 | 2.732 | 1.861 | 0.661 | 1.850 | 1.833 | 3.05E-05 | 3.170 | 1.718 | 0.575 | |
| 200 | 1.793 | 1.727 | 3.72E-06 | 2.132 | 1.610 | 0.798 | 1.713 | 1.639 | 3.76E-06 | 2.143 | 1.381 | 0.697 | |
| 500 | 1.480 | 1.370 | 2.62E-07 | 1.678 | 1.299 | 0.941 | 1.577 | 1.461 | 2.44E-07 | 1.739 | 1.256 | 0.863 | |
| 100 | 0.030 | 0.031 | 5.14E-04 | 0.172 | 0.077 | 0.005 | 0.000 | 0.000 | 3.27E+03 | 0.000 | 0.000 | 0.000 | |
| 200 | 2.381 | 2.514 | 6.85E-07 | 5.002 | 2.825 | 0.541 | 1.880 | 1.771 | 7.83E-07 | 5.026 | 2.030 | 0.389 | |
| 500 | 2.186 | 2.090 | 4.23E-08 | 2.756 | 1.850 | 0.712 | 2.184 | 2.044 | 4.03E-08 | 2.786 | 1.560 | 0.619 | |
| Wald-SOLS | Wald-SUR | Wald-FGLS | Wald-SOLS | Wald-SUR | Wald-FGLS | |
| Panel A: Single-equation test | ||||||
| 0 | 11.75 | 13.44 | 9.89 | 14.30 | 17.84 | 13.44 |
| 0.3 | 13.25 | 15.09 | 9.92 | 15.32 | 19.59 | 13.55 |
| 0.6 | 16.13 | 19.03 | 7.54 | 18.80 | 27.65 | 11.86 |
| 0.8 | 20.56 | 26.60 | 4.70 | 26.72 | 43.11 | 10.13 |
| 0 | 9.02 | 10.00 | 7.48 | 9.90 | 12.00 | 8.47 |
| 0.3 | 9.96 | 10.93 | 7.24 | 11.32 | 13.62 | 8.81 |
| 0.6 | 12.68 | 14.10 | 5.93 | 14.62 | 19.40 | 7.71 |
| 0.8 | 15.72 | 19.50 | 2.95 | 19.58 | 30.95 | 4.90 |
| 0 | 7.02 | 7.44 | 5.89 | 7.32 | 8.47 | 6.19 |
| 0.3 | 8.07 | 8.41 | 5.76 | 8.54 | 9.50 | 6.04 |
| 0.6 | 9.41 | 9.68 | 4.78 | 10.34 | 12.54 | 5.94 |
| 0.8 | 10.92 | 12.48 | 3.09 | 13.47 | 18.96 | 3.73 |
| Panel B: Joint test | ||||||
| 0 | 17.85 | 21.68 | 14.60 | 29.13 | 39.40 | 26.65 |
| 0.3 | 20.67 | 24.62 | 14.44 | 32.60 | 44.91 | 28.14 |
| 0.6 | 27.13 | 33.59 | 11.40 | 45.65 | 65.44 | 27.72 |
| 0.8 | 36.99 | 48.49 | 8.30 | 63.29 | 86.38 | 26.57 |
| 0 | 11.66 | 13.74 | 9.00 | 17.49 | 22.90 | 13.26 |
| 0.3 | 14.11 | 16.25 | 8.77 | 21.35 | 28.23 | 14.55 |
| 0.6 | 19.08 | 22.71 | 7.10 | 30.58 | 44.07 | 12.88 |
| 0.8 | 26.21 | 33.92 | 4.19 | 44.82 | 69.44 | 10.60 |
| 0 | 8.70 | 9.64 | 6.60 | 10.83 | 13.38 | 7.91 |
| 0.3 | 9.84 | 10.74 | 6.23 | 13.29 | 16.19 | 8.19 |
| 0.6 | 12.79 | 14.09 | 5.13 | 18.69 | 25.15 | 7.67 |
| 0.8 | 16.02 | 19.56 | 3.35 | 26.86 | 41.72 | 5.28 |
| Panel A: Size | ||||||||
| , serial correlation | 100 | 0.37 | 0.34 | 0.47 | 0.33 | 0.25 | 0.26 | |
| 200 | 0.42 | 0.37 | 0.53 | 0.42 | 0.28 | 0.18 | ||
| 500 | 0.71 | 0.60 | 1.24 | 0.64 | 0.58 | 0.80 | ||
| 100 | 8.10 | 7.28 | 1.86 | 22.35 | 19.65 | 1.29 | ||
| 200 | 3.54 | 3.22 | 2.91 | 8.70 | 7.57 | 1.09 | ||
| 500 | 1.92 | 1.69 | 4.54 | 4.02 | 3.40 | 4.78 | ||
| 100 | 50.63 | 48.67 | 16.17 | 90.99 | 89.69 | 18.41 | ||
| 200 | 40.48 | 38.56 | 14.26 | 83.71 | 81.78 | 12.16 | ||
| 500 | 21.87 | 20.36 | 16.25 | 60.09 | 56.87 | 15.49 | ||
| Panel B: Power DGP1 | ||||||||
| , | 1 | 100 | 44.03 | 39.92 | 21.14 | 57.65 | 52.05 | 14.00 |
| 200 | 57.97 | 50.90 | 33.85 | 74.09 | 66.64 | 27.24 | ||
| 500 | 67.23 | 57.37 | 56.63 | 88.41 | 80.23 | 49.93 | ||
| 2 | 100 | 66.64 | 63.83 | 38.07 | 84.77 | 80.36 | 29.12 | |
| 200 | 77.77 | 73.00 | 52.88 | 94.38 | 91.16 | 49.87 | ||
| 500 | 84.19 | 78.83 | 78.40 | 98.57 | 96.89 | 72.02 | ||
| 100 | 78.89 | 77.41 | 54.26 | 99.39 | 99.24 | 64.89 | ||
| 200 | 88.40 | 86.53 | 70.07 | 99.97 | 99.95 | 88.75 | ||
| 500 | 91.28 | 90.19 | 89.55 | 100.00 | 100.00 | 96.53 | ||
| Panel C: Power DGP2 | ||||||||
| , | 1 | 100 | 10.92 | 10.51 | 3.47 | 18.67 | 18.40 | 1.91 |
| 200 | 26.41 | 25.58 | 7.72 | 41.90 | 41.41 | 4.95 | ||
| 500 | 55.72 | 55.28 | 24.10 | 77.76 | 77.22 | 18.36 | ||
| 2 | 100 | 17.83 | 16.97 | 6.15 | 30.77 | 29.77 | 3.70 | |
| 200 | 40.29 | 39.04 | 15.57 | 62.01 | 61.15 | 10.51 | ||
| 500 | 69.64 | 68.50 | 38.48 | 91.66 | 91.27 | 30.87 | ||
| 100 | 23.16 | 22.53 | 9.88 | 53.66 | 51.21 | 10.45 | ||
| 200 | 47.25 | 45.60 | 22.83 | 83.78 | 82.73 | 29.05 | ||
| 500 | 75.60 | 73.98 | 56.79 | 98.23 | 98.15 | 69.35 | ||
| Panel D: Power DGP3 | ||||||||
| , | 1 | 100 | 77.03 | 77.06 | 28.34 | 94.26 | 94.12 | 21.04 |
| 200 | 89.55 | 89.14 | 35.64 | 98.97 | 98.84 | 31.79 | ||
| 500 | 97.26 | 97.05 | 55.61 | 99.91 | 99.91 | 48.48 | ||
| 2 | 100 | 87.40 | 86.75 | 52.85 | 98.95 | 98.81 | 43.05 | |
| 200 | 94.95 | 94.33 | 57.95 | 99.89 | 99.89 | 58.15 | ||
| 500 | 98.69 | 98.49 | 77.59 | 100.00 | 100.00 | 69.59 | ||
| 100 | 91.55 | 90.60 | 71.95 | 99.97 | 99.92 | 87.87 | ||
| 200 | 96.71 | 96.33 | 74.85 | 99.99 | 100.00 | 93.86 | ||
| 500 | 98.04 | 97.62 | 91.66 | 100.00 | 100.00 | 98.02 | ||
| Statistic | 16.54 | 12.66 | 8.19 |
|---|---|---|---|
| Rejection Rule (in %) | 0.00 | 0.03 | 2.73 |
| Block Size | 22 | 20 | 22 |
| 6 | 7 | 6 |
| Turning point | ||||
|---|---|---|---|---|
| SOLS | 9.173 | -0.411 | 68,900 | |
| (6.797,11.548) | (-0.535,-0.288) | |||
| Austria | SUR | 8.494 | -0.370 | 76,211 |
| (6.764,10.223) | (-0.464,-0.276) | |||
| FGLS | 6.553 | -0.276 | 708,712 | |
| (2.138,10.967) | (-0.513,-0.040) | |||
| SOLS | 12.927 | -0.645 | 22,420 | |
| (11.795,15.059) | (-0.702,-0.589) | |||
| Belgium | SUR | 9.973 | -0.503 | 20,195 |
| (9.158,10.787) | (-0.545,-0.461) | |||
| FGLS | 8.762 | -0.443 | 19,795 | |
| (7.236,10,287) | (-0.521,-0.365) | |||
| SOLS | 15.676 | -0.716 | 56,967 | |
| (14.162,17.289) | (-0.788,-0.643) | |||
| Finland | SUR | 15.136 | -0.684 | 63,400 |
| (14.030,16.242) | (-0.742,-0.627) | |||
| FGLS | 14.392 | -0.646 | 68,775 | |
| (12.075,16.708) | (-0.769,-0.523) | |||
| SOLS | 11.382 | -0.540 | 38,120 | |
| (10.140,12.624) | (-0.602,-0.477) | |||
| Netherlands | SUR | 10.063 | -0.475 | 39,637 |
| (9.183,10.944) | (-0.522,-0.429) | |||
| FGLS | 9.102 | -0.430 | 39,908 | |
| (7.606,10.597) | (-0.506,-0.353) | |||
| SOLS | 7.070 | -0.232 | ||
| (5.516,8.624) | (-0.310,-0.153) | |||
| Switzerland | SUR | 6.962 | -0.232 | |
| (5.481,8.443) | (-0.308,-0.156) | |||
| FGLS | 7.052 | -0.254 | ||
| (5.173,8.932) | (-0.350,-0.158) | |||
| SOLS | 8.450 | -0.429 | 20,242 | |
| (6.976,10.020) | (-0.502,-0.355) | |||
| United Kingdom | SUR | 9.523 | -0.475 | 22,596 |
| (8.486,10.560) | (-0.527,-0.423) | |||
| FGLS | 9.056 | -0.453 | 21,887 | |
| (7.244,10.868) | (-0.542,-0.364) |
| -23.88 | -23.49 | -22.75 | -21.87 | -21.25 | -20.66 | -20.11 | -19.27 | |
| -37.94 | -37.78 | -37.15 | -36.44 | -35.68 | -35.03 | -34.34 | -33.64 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11affiliationtext: Department of Econometrics and Data Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands22affiliationtext: Department of Econometrics, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands
Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve
Yicong Lin
Hanno Reuvers Corresponding author: Department of Econometrics, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands. E-mail address: [email protected].
Abstract
This paper develops the asymptotic theory of a Fully Modified Generalized Least Squares estimator for multivariate cointegrating polynomial regressions. Such regressions allow for deterministic trends, stochastic trends and integer powers of stochastic trends to enter the cointegrating relations. Our fully modified estimator incorporates: (1) the direct estimation of the inverse autocovariance matrix of the multidimensional errors, and (2) second order bias corrections. The resulting estimator has the intuitive interpretation of applying a weighted least squares objective function to filtered data series. Moreover, the required second order bias corrections are convenient byproducts of our approach and lead to standard asymptotic inference. We also study several multivariate KPSS-type of tests for the null of cointegration. A comprehensive simulation study shows good performance of the FM-GLS estimator and the related tests. As a practical illustration, we reinvestigate the Environmental Kuznets Curve (EKC) hypothesis for six early industrialized countries as in Wagner et al. (2020).
JEL Classification: C12, C13, C32, Q20
Keywords: Cointegrating Polynomial Regression, Cointegration Testing, Environmental Kuznets Curve, Fully Modified Estimation, Generalized Least Squares
1 Introduction
In recent years, there has been an increasing interest in the theoretical properties and theoretical justifications of nonlinear cointegrating relations. For theoretical properties we refer to the textbook treatise by Wang (2015) , the recent review article by Tjøstheim (2020), and the extensive references found in either of them. Theoretical justifications are in some cases refinements of existing economic theory, e.g. nonlinear cointegration among bond yields with different times to maturity due to yield-dependent risk premia as discussed in Breitung (2001), or nonlinear purchasing power parity due to transaction/transportation costs and trade barriers (e.g. Hong and Phillips (2010)). In other cases, economic theory postulates a nonlinear cointegrating relation from the outset. A popular example of the latter is the Environmental Kuznets curve described in Grossman and Krueger (1995).111There is no direct reference to Kuznets in the original paper by Grossman and Krueger (1995). But their nonlinear relations between environmental indicators and per capita GDP do remind strongly of the inverted U-shaped between income inequality and economic growth proposed by Kuznets (1901-1985). The term ‘environmental Kuznets curve’ was used later.
There are three branches of literature on the estimation of such nonlinear cointegrating relations. First, the papers by Park and Phillips (1999) and Park and Phillips (2001) are concerned with nonlinear cointegration analysis of a parametric form. Second, there is a literature on nonparametric kernel estimation of nonlinear cointegrating equations, see for example Wang and Phillips (2009) or Li et al. (2020). The third approach is reminiscent of a nonparametric sieve estimation with power polynomial basis. That is, one estimates a cointegrating relation containing integer powers of integrated regressors. Wagner and Hong (2016) named this a cointegrating polynomial regression (CPR). The multivariate seemingly unrelated regressions extension is available in Wagner et al. (2020). Our model specification builds on this Seemingly Unrelated Cointegrating Polynomial Regression (SUCPR) setup.
We make two theoretical contributions to the literature on cointegrating polynomial regressions. First, we propose the Fully Modified Generalized Least Squares (FM-GLS) estimator. This estimator requires two main steps: (1) It employs the inverse covariance matrix of the -dimensional innovation vector, that is, the covariance matrix of the vector which stacks the disturbances in the cointegrating equations and the disturbances driving the regressors over the time span . The estimation of this inverse covariance matrix is based on the Modified Cholesky Decomposition (MCD) originating from Pourahmadi (1999). The approach is computationally simple because the required quantities are obtained from the coefficients and prediction error variances of best linear least squares predictors. In our setting this translates into estimating multiple VAR models up to some maximum lag order . Sufficient conditions for consistency are provided. (2) We exploit the previous results to correct the second-order biases, resulting in improved efficiency and standard chi-square inference. Also note that the approach differs from the linear cointegration results in Mark et al. (2005) and Moon and Perron (2005) since our bias corrections do not rely on leads and lags augmentation. Second, a multi-equation cointegration specification asks for a multivariate cointegration test. Building upon the work by Choi and Saikkonen (2010), we propose three such tests. The first test uses pre-filtered residuals to account for serial correlation, whereas the other two are direct multivariate generalizations of the KPSS-type of test in Wagner and Hong (2016). The estimator and cointegration tests are subsequently studied by Monte Carlo simulation. In our simulations, the FM-GLS estimator has a higher estimation accuracy and its implied Wald test has better size control and higher size-adjusted power. We find by simulation that prefiltering improves the size control of the cointegration tests but has an adverse effect on power. In the empirical application there is a surprisingly large spread in the widths of the confidence intervals. It turns out that FM-SUR, and to a lesser degree FM-SOLS, underestimates the parameter uncertainty compared to FM-GLS.
The plan of this paper is as follows. Section 2 introduces the model and the modified Cholesky block decomposition. This decomposition is the main ingredient for the fully modified GLS estimator. The related asymptotic theory and stationarity tests are discussed in Section 3 whereas a finite sample simulation study is presented in Section 4. The empirical application can be found in Section 5 where we look at the environmental Kuznets curve. Section 6 concludes. All proofs are collected in the Appendices.222The Appendices contains the proofs of all the results that are related to the generalized least squares estimator. Supplementary material is available on the websites of the authors.
Some words on notation. denotes a generic positive constant. The integer part of the number is denoted by . For a vector , its dimension is abbreviated by and its -norm by . When applied to a matrix, signifies the induced norm defined by . The subscripts are omitted whenever , e.g. and where is the largest eigenvalue. Similarly, denotes the smallest eigenvalue. The Frobenius norm is denoted as . The identity matrix is written as . The row or column of an arbitrary matrix are selected using and , respectively. The Kronecker product is denoted “”. We use the symbol “” to signify weak convergence and the symbol “” for equality in distribution. The stochastic order and strict stochastic order relations are indicated by and .
2 The Model
As in Wagner et al. (2020), we study a system of seemingly unrelated cointegrating polynomial regressions (SUCPR), that is
[TABLE]
where the dependent variable and innovations are random vectors. For the cross-sectional unit , we use as explanatory variables: (1) deterministic components such as an intercept and polynomial time trends up to order , and (2) integer powers of the regressors up to degree . Defining , , and , we subsequently collect all explanatory variables in the block diagonal matrix . We are interested in the -dimensional parameter vector where . Overall, each cross-sectional unit in (2.1) specifies a single cointegrating relation containing polynomials in deterministic and stochastic trends. For each , the highest orders of these polynomials, i.e. and , are assumed to be fixed and known. We do not allow for cointegration in the cross-sectional dimension.
The innovation series is allowed to exhibit dependencies over time and across series. We assume that these dependencies can be modeled by a stationary VAR() process, that is
[TABLE]
(see Assumption 1 for further details). Efficient estimation of the parameter vector now requires the use of generalized least squares (GLS). Our Zellner (1962)-type GLS estimator relies on the inverse of the matrix where . In this paper, we directly estimate using a multivariate extension of the modified Cholesky decomposition by Pourahmadi (1999). This extension was named the Modified Cholesky Block Decomposition (MCBD) by Kim and Zimmerman (2012) and Kohli et al. (2016). The latter papers used the MCBD to parametrize the covariance matrix of multivariate longitudinal data. As in Beutner et al. (2019), we use the MCBD for the time series application mentioned above, i.e. the computation of . The decomposition is closely related to linear minimum MSE predictors.
We define
[TABLE]
and . The inverse of the covariance matrix is then given by
[TABLE]
where \bm{\mathcal{S}}_{\bm{u}}=\operatorname{diag}\Big{(}\bm{S}(0),\bm{S}(1),\ldots,\bm{S}(T-1)\Big{)},
[TABLE]
and the follow from the partitioning \bm{A}(\ell)=\big{[}\bm{A}_{1}(\ell),\ldots,\bm{A}_{\ell}(\ell)\big{]}.
Weak stationarity of implies that the block elements of being far below the main diagonal are small. This suggests a banding approach in which small elements are replaced by zeros. More specifically, we construct a Banded Inverse Autocovariance Matrix (BIAM) as
[TABLE]
where is called the banding parameter, \bm{\mathcal{S}}_{\bm{u}}(q)=\operatorname{diag}\Big{(}\bm{S}(0),\bm{S}(1),\ldots,\bm{S}(q),\ldots,\bm{S}(q)\Big{)} and with
[TABLE]
Example 1
Consider a stationary -dimensional VAR() process specified as with . For , the MCBD is based on
[TABLE]
Alternatively, with banding parameter , the related banded inverse autocovariance matrix is with
[TABLE]
The model of (2.1) can be stacked over time to yield the representation with , and as before. For the moment, we will assume to be known and focus on the following estimator:
[TABLE]
A discussion on the properties of this infeasible estimator is informative because: (1) the incurred estimation error of an appropriately constructed estimator will be asymptotically negligible, and (2) we can suppress the effect of banding by letting increase with sample size.
Two remarks related to are instructive. First, the GLS estimator differs from the usual least squares estimator by a weighing with the inverse covariance matrix . It is well documented in standard econometric textbooks (e.g. chapter 7 of Davidson and MacKinnon (2004)) that this weighing may lead to substantial efficiency gains. Second, it is illustrative to substitute the Modified Cholesky Decomposition of into the definition of this infeasible GLS estimator. The result is where , and . The premultiplications by have the effect of filtering and take care of serial correlation. applies scaling and rotation to account for the correlations between the series. The following univariate autoregressive setting exemplifies this intuition.
Example 2
A regression model has AR() innovations where and . Taking , the expressions of Example 1 are easily adapted to yield:
[TABLE]
and a similar transformation for the linear trend. The implied GLS estimator coincides with the estimator from Prais and Winsten (1954).
3 Asymptotic Theory
In this section, we present the asymptotic results. More specifically, we derive: (1) the limiting distribution of the GLS estimator, (2) the fully modified GLS (FM-GLS) estimator that corrects for second order bias terms, (3) a Wald test statistic, and (4) several multivariate KPSS-type of tests for the null of cointegration. We will also compare this FM-GLS estimator with the two fully modified estimators defined in Proposition 1 of Wagner et al. (2020). The following assumption will facilitate the development of the asymptotic theory.
Assumption 1 (Innovation Processes)
The innovations processes in the model satisfy the following assumptions:
- (a)
The process is an independent and identically distributed (i.i.d.) sequence with and for some constant and some . 2. (b)
\det\big{(}\bm{\mathcal{A}}(z)\big{)}\neq 0* for all and .* 3. (c)
* admits the VAR() process , where . Moreover, \det\big{(}\bm{\mathcal{D}}(z)\big{)}\neq 0 for all and .*
The stationary VAR() specifications for and are natural given the linear minimum MSE predictor formulae that underly the definitions of the MCBD and BIAM. Moreover, the conditions in Assumption 1 ensure that the lag polynomials and are invertible (see for example Theorem 7.4.2 of Hannan and Deistler (2012)), thereby showing that our Assumption 1 is similar to the linear processes assumptions that are regularly adopted in the literature on nonlinear cointegration, cf. Choi and Saikkonen (2010), Wagner and Hong (2016), and Wagner et al. (2020). The assumption \det\big{(}\bm{\mathcal{D}}(1)\big{)}\neq 0 rules out cointegration among the components of .
Under Assumption 1(a), an invariance principle holds for , i.e. where denotes an -dimensional Brownian motion with covariance matrix . Moreover, Assumptions 1(b)-(c) justify the use of the Beveridge-Nelson decomposition (Phillips and Solo (1992)). A functional central limit theorem for linear processes is thus also applicable to , that is
[TABLE]
where the Brownian motion of dimension has covariance matrix
[TABLE]
Apart from this long-run covariance matrix \bm{\varOmega}=\sum_{h=-\infty}^{\infty}\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}, we also introduce the one-sided long-run covariance matrix \bm{\varDelta}=\left[\begin{smallmatrix}\bm{\varDelta}_{uu}&\bm{\varDelta}_{uv}\\ \bm{\varDelta}_{vu}&\bm{\varDelta}_{vv}\end{smallmatrix}\right]=\sum_{h=0}^{\infty}\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}. The Brownian motion defined by is by construction orthogonal to . Its covariance matrix equals .
3.1 Infeasible GLS
We start our analysis assuming that the covariance matrix is a known quantity for each . The modified Cholesky block decomposition of page 2.2 can now be used to derive the limiting distribution of this infeasible GLS estimator. A insightful exposition of our results requires further notation.
- (a)
Introduce scaling matrices: for the time trends, and for the stochastic trends. Moreover, we define , where . 2. (b)
Let , and . Define block-diagonal random matrix . 3. (c)
\bm{b}_{i}=\Big{[}\bm{0}_{d_{i}+1}^{\prime},1,2\int_{0}^{1}B_{v_{i}}(r)dr,\dots,s_{i}\int_{0}^{1}B_{v_{i}}^{s_{i}-1}(r)dr\Big{]}^{\prime}.
Finally, we use as shorthand notation for the component of .
Theorem 1 (Limiting Distribution of the infeasible GLS Estimator)
If Assumption 1 holds, and if satisfies as , then
[TABLE]
where and {\bm{\mathcal{B}}}_{i}=\operatorname{row}_{i}\Big{(}\bm{\varSigma}_{\epsilon\eta}\Big{)}\operatorname{col}_{i}\Big{(}\bm{\varSigma}_{\eta\eta}^{-1}\Big{)}\leavevmode\nobreak\ \bm{b}_{i}.
The limiting result in (3.3) coincides with the limiting distribution of the MSUR estimator, \widetilde{\bm{\beta}}_{MSUR}:=\left(\bm{Z}^{\prime}(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1})\bm{Z}\right)^{-1}\Big{(}\bm{Z}^{\prime}(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1})\bm{y}\Big{)}, as reported in Wagner et al. (2020), see their Proof of Proposition 1. The equivalence of these limiting distributions is caused by the facts that: (1) applying a linear filter to an integrated series only affect its long-run variance (e.g. Phillips and Park (1988)), and (2) the previous statement remaining true when applying a linear filter to higher integer powers of integrated series. The terms and in (3.3) reflect the presence of second order bias terms caused by serial correlation and endogeneity. In Section 3.3, we introduce the fully modified (FM) correction that adjust these bias terms and leads to standard inference. We first introduce a feasible version of the GLS estimator.
3.2 Consistent Estimation of and Feasible GLS
Up to this point we have discussed the infeasible estimator . A feasible GLS approach requires a consistent estimator of the matrix . Several authors, e.g. Wu and Pourahmadi (2009) and McMurry and Politis (2010), have constructed consistent estimators of large covariance matrices using banding or tapering to reduce the number of unknown parameters. Direct usage of their results poses two difficulties because: (1) numerical inversion of large matrices is computationally expensive for large , and (2) matrix inversion might even be impossible because the estimated covariance matrix cannot be guaranteed to be positive definite. In the light of the such considerations, we will estimate directly and ensure it to be positive definite. The approach is the sample counterpart of the BIAM described on page 2. That is, we replace true innovations by first stage OLS residuals , and subsequently minimise a sample moment in estimated residuals rather than the population mean squared forecasting error. This method was previously used by Cheng et al. (2015) and Ing et al. (2016a) for univariate time series. For a multivariate time series, we define
[TABLE]
, and . Similarly to (2.6)-(2.7), we subsequently construct the matrices and , and obtain the implied multivariate BIAM estimator as
[TABLE]
Assumption 2
For and , assume .
Assumption 3
Assume satisfies as .
Assumption 2 requires the residuals to be sufficiently close to the true innovations. It is a rather mild assumption and it is satisfied if residuals are computed by least squares. Assumption 3 places constraints on the banding parameter . First, Assumption 3 requires the banding parameter to diverge with sample size. This ensures that no nonzero elements are (asymptotically) set to zero. Moreover, the assumption establishes an upper bound for the growth rate of . The definition of , see (3.4), shows that we are fitting a vector autoregression (VAR) of increasing lag order to the residuals. Identical rate requirements are reported by Lewis and Reinsel (1985) when they derive consistency and asymptotic normality results when finite VAR models are fitted to infinite order VAR processes. The following theorem shows the consistent estimation of and implies that the infeasible and feasible GLS estimator have the same limiting distribution.
Theorem 2 (Consistent Estimation of )
[TABLE]
3.3 Fully Modified Inference
The asymptotic results of Theorem 1 is not immediately useful for statistical inference. There are two difficulties. First, the second order bias dislocates the limiting distribution which can translate into substantial finite sample bias. This leads to a loss in efficiency. Second, possible dependencies between the Brownian motions and cause the limiting distribution to depend on nuisance parameters. Critical values would therefore be nuisance parameter dependent as well.
These two issues have received extensive attention in the linear cointegration literature. A (non-exhaustive) list of solution methods is: joint modeling as in Phillips (1991), Saikkonen’s (1992) dynamic least squares, and the integrated modified OLS and fixed-b approaches by Vogelsang and Wagner (2014). We rely on the fully modified (FM) approach advocated by Phillips and Hansen (1990) and Phillips (1995). The idea is a twofold modification of the estimator: (1) second order bias terms are subtracted, and (2) a transformation of the dependent variable is introduced to obtain a zero-mean Gaussian mixture limiting distribution. Recently, Wagner et al. (2020) have proposed two estimators within the framework of seemingly unrelated cointegrating polynomial regressions. These estimators, FM-SOLS and FM-SUR, rely on kernel estimators of the one- and two-sided long-run covariance matrix (see Theorem 3). As such, we introduce the following assumption.
Assumption 4 (Consistent Estimation of Long-run Covariance Matrices)
* and are consistent kernel estimators of the long-run covariance matrix and the one-sided long-run covariance matrix , respectively.*
Andrews (1991) and Newey and West (1994) use kernel estimators for long-run covariance estimation. Their method involves the calculation of weighted sums of the autocovariance matrices of the residuals. These weights are determined by a kernel function and bandwidth parameter. Our Assumption 4 is easily satisfied by imposing suitable conditions on the kernel function and bandwidth parameter. We refer to Phillips (1995) and Jansson (2002) for an enumeration of such conditions.
Alternatively, we can obtain consistent one- and two-sided long-run covariance estimators within the BIAM framework of Section 3.2.333An overview of the procedure is given here. Section S2 in the Supplement provides further details. This approach resembles Berk (1974). The GLS estimator and its FM counterpart are thus constructed within a single framework. The estimators are as follows. For all , we first stack and in the -dimensional vector . Since the BIAM estimator is fitting VAR processes up to order , we will use the estimated VAR() approximations to define the long-run covariance estimators. For , the estimator is , where and denote respectively the estimated prediction error variance and the coefficient matrix of the lag when a VAR() is fitted to . The population one-sided long-run covariance matrix is \bm{\varDelta}=\sum_{h=0}^{\infty}\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}. It is thus intuitive to approximate this quantity by a finite sum of estimated covariance matrices of . These covariance matrices are nothing but subblocks of the matrix .444We use to denote the matrix where . The matrices and are defined similarly to respectively and (see page 3.5). The matrix is lower triangular with identity matrices on the main diagonal. Therefore, its matrix inverse exists and is fast to compute. We therefore use
[TABLE]
where is an \big{(}2nT\times 2n\big{)} block matrix of zeros of which the last blocks have been replaced by identity matrices. To ensure consistency, we place the following rate restriction on the number of included autocovariance matrices.
Assumption 5
As , , , and .
Definitions and limiting results for FM estimators are presented in Theorem 3. The FM-SOLS, FM-SUR and FM-GLS estimator all depend on estimators for and . It is only the consistency of these estimators that is relevant for the asymptotic analysis, not whether the kernel or BIAM approach is employed. As such, we will not complicate notation by introducing additional notation to indicate whether the kernel or BIAM approach is used. In subsequent theorems, simulation results and the empirical application we will use kernel estimators for FM-SOLS and FM-SUR, and the BIAM approach for FM-GLS. This seems to be the logical choice for these estimators.
Theorem 3
For , define , for . Also, define the matrix as the (implied) consistent estimator of .
- (a)
Define the FM-SOLS estimator as
[TABLE]
where with , and with and being the element on the main diagonal of . If Assumptions 1 and 4 hold, then
[TABLE] 2. (b)
Define the FM-SUR estimator as
[TABLE]
where with . If Assumptions 1 and 4 hold, then
[TABLE] 3. (c)
Define the FM-GLS estimator as
[TABLE]
where , and \widehat{{\bm{\mathcal{B}}}}^{+}=\big{[}\widehat{{\bm{\mathcal{B}}}}_{1}^{+\prime},\dots,\widehat{{\bm{\mathcal{B}}}}_{n}^{+\prime}\big{]}^{\prime} with
[TABLE]
If Assumptions 1-3 and 5 hold, then
[TABLE]
The FM-GLS estimator is new to the seemingly unrelated CPR literature, whereas the FM-SOLS and FM-SUR estimators have recently appeared in Wagner et al. (2020). Theorem 3 indicates that all three estimators have a zero-mean Gaussian mixture limiting distribution implying that standard inference is applicable for each. However, we also see from Theorem 3 that the limiting distributions are generally different because different types of weighing are used in the construction of the estimators.555There are special cases in which some (pairs of) estimators become asymptotically equivalent. For example, if , then all estimators are asymptotically equivalent because the weighting matrices and are now scalars. Also, under exogeneity, we have and the FM-SUR an FM-GLS estimators share the same limiting distribution.
For completeness, we also detail how the FM-GLS estimator can be used to test linear hypotheses. A formal presentation of such a result is more involved because of the different convergence rates of the individual parameter estimators. That is, the parameters with the lowest convergence rate will dominate the asymptotic distribution and one should take care to avoid a degenerate limiting distribution. We will rule out such complications by considering hypothesis tests on individual parameters.666For general linear hypothesis, we refer the reader to Sims et al. (1990) where a reordering based on convergence rates is used to establish the limiting distribution of the Wald statistic for general linear hypothesis. The same approach is applicable in our setting but we will not explore this in greater detail. Therefore, let denote a selection matrix in which every row contains a single 1 and zeros otherwise. The null hypothesis can be tested using the standard chi-squared limiting distribution of the Wald statistic (Theorem 4). These tests are practically relevant. For example, exclusion restrictions of the type allow us to test whether the cointegrating relation is linear.
Theorem 4
Consider the null hypothesis , which imposes linearly independent restrictions. Under the assumptions of Theorem 3(c), the Wald test statistic
[TABLE]
where \widehat{\bm{\varPhi}}=\bm{R}\left[\bm{Z}^{\prime}\left(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1}\right)\bm{Z}\right]^{-1}\left[\bm{Z}^{\prime}\Big{(}\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1}\widehat{\bm{\varOmega}}_{u.v}\widehat{\bm{\varOmega}}_{uu}^{-1}\Big{)}\bm{Z}\right]\left[\bm{Z}^{\prime}\left(\bm{I}_{T}\otimes\widehat{\bm{\varOmega}}_{uu}^{-1}\right)\bm{Z}\right]^{-1}\bm{R}^{\prime}.
3.4 Testing the Null of Cointegration
Stationarity tests are used to avoid spurious regressions and to verify the correct specification of the cointegrating relation. To test for stationarity of the seemingly unrelated cointegrating polynomial regressions (SUCPR) errors, we combine the test statistic from Nyblom and Harvey (2000) with the sub-sampling approach found in Choi and Saikkonen (2010) and Wagner and Hong (2016). We consider three test statistics. To treat all test statistics in a unified framework, we define
[TABLE]
that is, a vector of length stacking the cumulative sums of . If the true innovations were observed, then we could use the full-sample KPSS-type of test statistic to test for stationarity of the innovations. Under the null of stationarity, this test statistic would converge weakly to with denoting an -dimensional standard Brownian motion. This limiting distribution is free of nuisance parameters and the cumulative distribution function is available as a series expansion (see the Supplement).
The innovations are only available when cointegrating relations are pre-specified. If these coefficients are estimated, then this additional parameter uncertainty will contaminate the limiting distribution with nuisance parameters.777There are exceptions. Shin (1994) reports a nuisance parameter free limiting distribution for a single-equation linear cointegrating relation. This remains true if only a single integrated variable enters the cointegrating regression with a higher power, see Proposition 5 in Wagner and Hong (2016). The idea behind the subsampling approach is to construct a test statistic incorporating residuals while computing parameter estimators from all observations. If increases slowly with sample size, then the parameter estimation error will be negligible relative to the randomness in the errors and the asymptotic distribution remains .
The three KPSS-type of test are based on the following residuals: , , and . The test statistic are:
[TABLE]
and
[TABLE]
where is the submatrix of obtained by selecting the rows and columns related to all time indices in the set . The test statistic in (3.17) fits naturally into the FM-GLS estimation framework.
Theorem 5
Let the assumptions from Theorem 3 hold.
- (a)
If as , then
[TABLE] 2. (b)
If as , then for any .
A sample of size allows for up to series of nonoverlapping blocks of residuals of length . Similarly to Choi and Saikkonen (2010), we apply the Bonferroni procedure to use all these series and thereby increase power. The approach is applicable to any of the three test statistics in Theorem 5. As such, we keep the notation general and use a generic to denote a test statistic based on the subseries, . In the Bonferrroni procedure we compute and do not reject the null hypothesis whenever with defined by . The Bonferroni inequality implies and we see that the probability of a type-I error does not exceed the significance level .
Remark 1
We suggest to follow Choi and Saikkonen (2010) in terms of the implementation of the subsampling approach. That is, the block size is selected using the minimum volatility rule by Romano and Wolf (2001). For this particular block size we subsequently select subsamples by taking non-overlapping blocks from alternatively the start and the end of the sample.
Remark 2
The limiting results in Theorem 5 guarantee a correct asymptotic size. Our simulations show (1) that these tests have power against various alternative hypotheses and (2) that power increases with sample size. A theoretical investigation of the power properties is outside of the scope of this paper.
4 Simulations
We now study the finite sample performance of the estimators and stationarity tests. First, we compare the FM-GLS estimator with the FM-SOLS and FM-SUR estimators from Wagner et al. (2020). All long-run covariance matrices are computed using a Bartlett kernel and the automatic bandwidth selection approach due to Andrews (1991). For FM-GLS, the banding parameter is selected using the subsampling and risk-minimization approach explained in section 5 from Bickel and Levina (2008).888More details concerning the implementation can be found in the Supplement. Infeasible counterparts of the estimator are constructed assuming the knowledge of the true serial correlation and/or cross-sectional dependence pattern. These estimators are denoted by infSOLS, infSUR, and infGLS. Second, we look at the cointegration tests. We consider three test statistics: and use the residuals as in (3.16), whereas employs the pre-filtered residuals from (3.17). All tests are implemented with minimum volatility block size selection and Bonferroni correction.
We consider and . All tests are performed at a nominal significant level of . For stationary processes, a presample of 200 observations is used to remove the influence of the starting values. All results are based on Monte Carlo replicates.
4.1 Monte Carlo Designs
We generate data according to a quadratic seemingly unrelated CPR. That is, we adopt the DGP in (2.1) with \bm{z}_{it}=\big{[}1,t,x_{it},x_{it}^{2}\big{]}^{\prime}. The integrated variables satisfy and . We explore two error processes.
Setting A (Errors as in Wagner et al. (2020)): As a benchmark, we revisit the simulation setting in Wagner et al. (2020) and generate innovations according to
[TABLE]
where \bm{\varepsilon}_{t}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}\big{(}\bm{0},\bm{\varSigma}(\rho_{3})\big{)}, \bm{e}_{t}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}\big{(}\bm{0},\bm{\varSigma}(\rho_{4})\big{)} and
[TABLE]
is a symmetric Toeplitz matrix. The parameter controls the level of serial correlation and measures the degree of endogeneity. The parameters and indicate the extent of correlation across equations induced through and , respectively. For simplicity, we assume identical values . The true coefficient vector is \bm{\beta}=\big{[}\bm{\beta}_{1}^{\prime},\dots,\bm{\beta}_{n}^{\prime}\big{]}^{\prime}, where with , .
Setting B (VARMA Errors): To further investigate the importance of serial correlation, we consider a second specification of the innovation process:
[TABLE]
where and are generated as \left[\begin{smallmatrix}\bm{\eta}_{t}\\ \bm{\varepsilon}_{t}\end{smallmatrix}\right]\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}\big{(}\bm{0},\bm{\varSigma}(\theta)\big{)} and as in (4.2) but with parameter . The matrices () are generated independently and similarly to Chang et al. (2004). That is, we take the following three steps:
- (a)
Generate an random matrix from and construct the orthogonal matrix \bm{H}_{i}=\bm{U}_{i}\Big{(}\bm{U}_{i}^{\prime}\bm{U}_{i}\Big{)}^{-1/2}. 2. (b)
Generate eigenvalues \lambda_{i1},\dots,\lambda_{in}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}\underline{\lambda},\bar{\lambda}\big{]}. 3. (c)
Let and compute .
The parameter governs regressor-error correlation and cross-equation correlation. The amount of serial correlation is specified through and . The three scenarios \big{(}\underline{\lambda},\bar{\lambda}\big{)}\in\big{\{}\left(0.1,0.5\right),\left(0.5,0.8\right),\left(0.8,0.95\right)\big{\}} steadily increase the autocorrelation in the generated data.
Setting C (Cointegration Tests): We continue to construct innovations according to Setting B. Moreover, we fix \left[\begin{smallmatrix}\bm{\eta}_{t}\\ \bm{\varepsilon}_{t}\end{smallmatrix}\right]\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}\big{(}\bm{0},\bm{\varSigma}(\theta)\big{)} with , and we construct the matrices and using \big{(}\underline{\lambda},\bar{\lambda}\big{)}=(0.1,0.5). The eigenvalues of are varied to explore both size and power properties. We always estimate a quadratic seemingly unrelated CPR.
Size DGP.
We generate the eigenvalues of as before. That is, take \lambda_{11},\dots,\lambda_{1n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}\underline{\lambda},\bar{\lambda}\big{]}, where \big{(}\underline{\lambda},\bar{\lambda}\big{)}\in\left\{\left(0.1,0.5\right),\left(0.5,0.8\right),\left(0.8,0.95\right)\right\}.
Power DGP1.
We set for and generate \lambda_{1j}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}0.1,0.5\big{]} for . The integer represents the number of unit roots in .
Power DGP2.
The eigenvalues of are sampled as \lambda_{11},\dots,\lambda_{1n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}0.1,0.5\big{]}, and the first series follow a cubic SUCPR specification:
[TABLE]
Power DGP3.
We again take \lambda_{11},\dots,\lambda_{1n}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{U}\big{[}0.1,0.5\big{]} and construct
[TABLE]
where represents for the number of equations that specify a spurious relation.
Overall, the Power DGPs 1-3 consider: missing regressors, omitted higher order powers of the regressor , and spurious regressions, respectively.
4.2 Discussion of the Simulation Results
Tables 1 and 2 report the empirical mean squared error (MSE) for both feasible and infeasible estimators. As results are qualitatively similar across equations, we only report on the estimators for (the coefficient in front of ). The column with FGLS contains the numerical value of the MSE and the MSEs of all other estimators are expressed relative to this benchmark. Values above 1 indicate a better performance of FM-GLS. We make the following observations:
- (a)
The FM-GLS estimator generally has the lowest MSE among all feasible estimators. These efficiency gains are small at low levels of endogeneity and serial correlation, but become sizeable at higher levels. Moreover, the Monte Carlo outcomes for the infeasible estimators indicate that these gains remain when the estimators are informed about the true endogeneity and serial correlation properties. It is thus the GLS weighting of the data that improves estimation accuracy. 2. (b)
There is one particular instance in Table 2 in which the performance of the FM-GLS estimator has a high MSE, namely the case of high persistency \big{(}\underline{\lambda},\bar{\lambda}\big{)}=(0.8,0.95), high endogeneity , and small sample size . This is caused by an inaccurate BIAM estimator resulting from the combination of a small sample size, high endogeneity, and high persistency. The problem disappears when increases.
The subsequent set of simulations evolves around hypothesis testing, see Table 3 and Figures 1-6. The errors are simulated using Setting A and we use the following Wald-type test statistics: the Wald-SOLS and Wald-SUR tests as developed in Proposition 2 in Wagner et al. (2020), and the Wald-FGLS test from Theorem 4. We consider: (i) the single equation test against the two-sided alternative , and (ii) the joint test against the alternative which rejects when at least one coefficient is unequal to . Some general remarks regarding size and size-corrected power are as follows.
- (c)
The Wald tests are typically oversized but the three tests react differently to changes in . Increases in result in an increasing size for the SOLS and SUR version of the Wald test, whereas increases in lead to size decreases for the GLS type of Wald test. Overall, the GLS test provides better size control. 2. (d)
In Figures 1 and 2, we vary the serial correlation parameter and the endogeneity parameter separately. Overall, variation in has a larger influence on size with the SUR test being most sensitive and the GLS test being least sensitive. 3. (e)
For all three Wald-type of tests, the size of the tests improves with sample size . 4. (f)
The ordering in terms of size-corrected power is the same throughout Figures 3-6. That is, size-corrected power is lowest for Wald-SOLS, increases for Wald-SUR, and is highest for the Wald-FGLS test.
The simulation results for the KPSS-type of cointegration tests can be found in Table 4. The general conclusions are as follows.
- (g)
The empirical sizes of the and tests are similar. We see: very conservative results for low serial correlation, decent size for medium serial correlation, and strongly oversized tests at high levels of serial correlation. These findings are completely in line with the simulation results that are reported in table 3 of Choi and Saikkonen (2010). The same behaviour is observed for the test but the deviations from the 5% level are less extreme. 2. (h)
The power of the , , and tests behaves as expected: (1) power always increases with sample size, and (2) power increases when more unit roots, more misspecified equations, or more spurious relationships are incorporated in the DGP. The test has the lowest power among the three tests. This is caused by the fact that the filter can nearly difference the data and hence make it appear more stationary.
5 Empirical Application
The Environmental Kuznets Curve (EKC) conjectures an inverted U-shaped relation between environmental degradation and income per capita. That is, there is an initial decline in environmental quality with increasing economic activity, but beyond a certain turning point (caused by e.g. industrial transformation and increasing environmental awareness), economic growth goes hand in hand with environmental improvement. A more detailed description and historical overview of the EKC can be found in Stern (2004) and Stern (2017), respectively. The implications of further economic growth on pollution, e.g. the emission of greenhouse-gases, are also key in understanding the future of global warming (Nordhaus (2013)).
We builds upon and compare with Wagner et al. (2020). That is, we look at carbon dioxide emissions and GDP as proxies for environmental pollution and economic development (both per capita and in logarithms), respectively. The data is collected from the Maddison Project Database (MPD) and the homepage of the Carbon Dioxide Information Analysis Center (CDIAC).999The Maddison Project Database, Bolt et al. (2018), contains the data on population size and real GDP. The data on originates from Boden et al. (2017). We follow the official guidelines and multiply by and to convert the reported fossil-fuel emissions into total carbon dioxide emissions. As in Wagner et al. (2020), we consider Austria (AT), Belgium (BE), Finland (FI), the Netherlands (NL), Switzerland (CH) and the United Kingdom (UK). Our yearly data spans the period from 1870 to 2014. We refer to the latter paper for a discussion of the stationarity properties of all series as well as the motivation for this particular set of countries. Overall, the dataset consist of countries with time series observations each. Such a panel with small and large is ideally suited for our FMGLS approach since the multivariate banded inverse autocovariance matrix remains computable.
We estimate the quadratic model specification:
[TABLE]
where and are emissions and GDP, respectively. As the first step in our analysis we employ the multivariate stationarity tests of Section 3.4 to check this model specification (Table 5). All three tests reject the null of cointegration at a 5% level signalling inappropriateness of the quadratic formulation. Figure 7 shows the residuals on which these tests are based. What stands out in these graphs is the erratic behaviour of the series around the two world wars. Based on this fact, and to be able to compare to Wagner et al. (2020), we will continue the analysis using model (5.1) and the given collection of countries. Before doing so, it will be worthwhile to discuss the time series properties of these residuals.
We consider the series in the remainder of this section but the other residuals series will provide qualitatively similar outcomes. When fitting the VAR() models with to these residuals, the BIC information criterion selects a lag order of . The absolute eigenvalues of the estimated coefficient matrix are , and the estimate for the error correlation matrix is
[TABLE]
There is thus serial and cross-sectional correlation to be exploited by the FM-GLS estimator.
The FM-SOLS, FM-SUR and FM-GLS estimation results of Model (5.1) are reported in Table 6. An inspection of the coefficient estimates and their confidence intervals reveals that: (1) is positive for each country, (2) is negative for each country, and (3) all coefficients are significant at the 5% level. All these three facts are in line with the EKC hypothesis.101010This is non-surprising because Wagner et al. (2020) have selected the current set of countries because they display the EKC behaviour. Also, our estimation results are slightly different from those in Wagner et al. (2020) due to the additional data for 2014, possible data updates, and/or differences in the bandwidth selection of the long-run covariance matrices. Accordingly, there exists a turning point after which further per capita economic growth reduces per capita carbon dioxide emissions. The numerical values for the turning points are heterogeneous between countries.
The widths of the confidence intervals for and display a similar pattern. From shortest to longest, the ordering is always FM-SUR, FM-SOLS, and FM-GLS and we also see how widths vary substantially between methods. To uncover the origin of these findings we conduct one final simulation study with a parameter specification that closely mimics the properties of the dataset.111111The details of this simulation DGP are provided in Section S3 of the Supplement. A visualisation of the data and the model fit are also provided there. The average empirical coverage probabilities of asymptotic 95% confidence intervals are 78.0%, 66.5% and 89.0% for FM-SOLS, FM-SUR, and FM-GLS, respectively. In other words, the calculated confidence intervals are generally too short. By reverse engineering it turns out that the confidence intervals should be scaled by factors of 1.67, 2.15 and 1.24 to bring them back to the desired nominal level. Overall, the applied researcher should be careful when using the confidence intervals as indications for parameter uncertainty.
6 Conclusion
We proposed a framework to conduct inference on cointegrating polynomial regressions. Parameters are obtained using a Fully Modified GLS estimator and we studied a cointegration test that is based on filtered residuals. Monte Carlo simulations revealed the advantages and disadvantages of these methods. The empirical researcher should realize that all estimation approaches have a tendency to underestimate parameter uncertainty and thus provide confidence intervals that are too small. The FM-GLS estimator suffers the least from this problem. Several interesting questions are left for future research. From a theoretical viewpoint, it is interesting to study the behaviour of the modified Cholesky decomposition (and BIAM) when the series under consideration is nonstationary. This would give insights into the behaviour of: (1) the FM-GLS estimator while estimating spurious regressions, and (2) the power properties of the cointegration tests. From a practical viewpoint, there seems a need to obtain more acurate standard errors of the parameter estimators.
Acknowledgements
This paper has been presented at the 2018 CFE meeting in Pisa, the NESG 2019 conference in Amsterdam, and the RCEA Time Series Econometrics Workshop in Larnaca. We would like to thank conference participants, especially Peter Pedroni and Peter Phillips, for useful comments and suggestions. We extend our thanks to Eric Beutner, Dick van Dijk, Richard Paap, Franz Palm, and Stephan Smeekes for their valuable feedback on earlier versions of this manuscript. All remaining errors are our own.
Appendix A Proofs of Main Theorems
Lemma 1
Let denote the lag polynomial implied by the coefficient matrices in (2.3). By the Beveridge-Nelson (BN) decomposition, we also have where with . If Assumption 1 holds, then
- (a)
\bm{\mathcal{A}}_{q}(1)=\bm{\mathcal{A}}(1)+O\left(\sum_{j=q+1}^{\infty}j^{1/2}\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}\right)** 2. (b)
There exists a such that \sum_{j=1}^{q}\big{\|}\widetilde{\bm{A}}_{j}(q)\big{\|}_{\mathcal{F}}<\infty for all
- Proof
(a) By Cauchy-Schwartz, we have \big{\|}\bm{\mathcal{A}}_{q}(1)-\bm{\mathcal{A}}(1)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}\big{\|}\bm{A}_{j}(q)-\bm{A}_{j}\big{\|}_{\mathcal{F}}+\sum_{j=q+1}^{\infty}\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}\leq\Big{(}q\sum_{j=1}^{q}\big{\|}\bm{A}_{j}(q)-\bm{A}_{j}\big{\|}_{\mathcal{F}}^{2}\Big{)}^{1/2}+\sum_{j=q+1}^{\infty}\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}} and subsequently use Lemma S1 (see Supplement). (b) Using Baxter’s inequality in Theorem 6.6.12 in Hannan and Deistler (2012) (also see their Remark 3) for the final inequality, we derive \sum_{j=1}^{q}\big{\|}\widetilde{\bm{A}}_{j}(q)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}j\big{\|}\bm{A}_{j}(q)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}j\big{\|}\bm{A}_{j}(q)-\bm{A}_{j}\big{\|}_{\mathcal{F}}+\sum_{j=1}^{q}j\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}\leq C\sum_{j=1}^{q}j\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}\leq C. ∎
- Proof of Theorem 1
The premultiplication by applies a linear filter whereas implies weighting. Since the behaviour of the first elements does not affect the asymptotic results, we take for and for all , we apply the transformations implied by and .121212The same argumentation is used in Phillips and Park (1988). The modification to obtain a rigorous proof is straightforward. Consequently, we have
[TABLE]
Using the BN decomposition, Lemma 1, we first show
[TABLE]
Note that \left\|\sum_{j=1}^{q}\widetilde{\bm{A}}_{j}(q)\Delta\bm{Z}_{t-j+1}^{\prime}\bm{G}_{T}\right\|\leq\sum_{j=1}^{q}\big{\|}\widetilde{\bm{A}}_{j}(q)\big{\|}\leavevmode\nobreak\ \big{\|}\Delta\bm{Z}_{t-j+1}^{\prime}\bm{G}_{T}\big{\|} and that for any ,
[TABLE]
The vector typically contains elements T^{-(k+\frac{1}{2})}\big{[}t^{k}-(t-1)^{k}\big{]} where . By the inequality , for , , we obtain , and thus T^{-(k+\frac{1}{2})}\big{[}t^{k}-(t-1)^{k}\big{]}\leq q_{i}\leavevmode\nobreak\ T^{-3/2}\leq CT^{-3/2}. As a result, . The vector typically contains elements T^{-(k+1)/2}\big{(}x_{it}^{k}-x_{it-1}^{k}\big{)}, where . The binomial expansion implies x_{it}^{k}-x_{it-1}^{k}=\sum_{m=0}^{k-1}{k\choose m}x_{it-1}^{m}v_{it}^{k-m}=O_{p}\big{(}T^{(k-1)/2}\big{)}, and thus T^{-(k+1)/2}\big{(}x_{it}^{k}-x_{it-1}^{k}\big{)}=O_{p}\big{(}T^{-1}\big{)}. It further implies that \left\|\bm{G}_{\bm{s}_{i},T}\Delta\bm{s}_{it}\right\|^{2}=O_{p}\big{(}T^{-2}\big{)}. Combining \big{\|}\Delta\bm{Z}_{t}^{\prime}\bm{G}_{T}\big{\|}=O_{p}\big{(}T^{-1}\big{)} with Lemma 1(b) we establish the second equality in (A.2). Now (A.2) follows from Lemma 1(a) and \sum_{j=q+1}^{\infty}j^{1/2}\big{\|}\bm{A}_{j}\big{\|}_{\mathcal{F}}=o\big{(}q^{-1/2}\big{)}. Using (A.2) and \big{\|}\bm{S}(q)-\bm{\varSigma}_{\eta\eta}\big{\|}\rightarrow 0 (see (S1.17) in the supplementary material), we obtain
[TABLE]
We rewrite the second part in (A.1) as
[TABLE]
and we will repeatedly use the identity
[TABLE]
for any matrices , and . Using this identity, we have
[TABLE]
where is the entry of . For , we subsequently use the BN decomposition to obtain:
[TABLE]
By definition, we have , where the limiting distribution of each block follows from Proposition 1 of Wagner and Hong (2016). More specifically, the block will converge to a stochastic integral and a second order bias term which is proportional to (the element of ), , and thus
[TABLE]
where . ∎
As , we again consider the limiting distributions block-wise. Every element in the blocks will rely on one of the following three results. (1) As derived below (A.3), we have \big{|}T^{-(j+\frac{1}{2})}\sum_{t=1}^{T}\big{[}t^{j}-(t-1)^{j}\big{]}\eta_{it}\big{|}\leq CT^{-3/2}\sum_{t=1}^{T}|\eta_{it}|=o_{p}(1). (2) By Assumption 1, for any , and are Near Epoch Dependent in -norm on of size and arbitrary size, respectively. A small variation on Theorem 17.9 from Davidson (1994) shows that are -NED of size . The i.i.d. assumption on allows for a LLN for the sequence , see e.g. Theorem 20.21 of Davidson (1994), implying . (3) , where . The specific reason is as follows. By the binomial expansion (below (A.3)), we have
[TABLE]
where . To see this, we refer to de Jong (2002). The moment and NED conditions in his Assumption 1 are satisfied. Moreover, since is homogeneous of degree , his Assumption 2 holds as well. The desired result now follows from Theorem 1 in de Jong (2002). Combining these results, we obtain
[TABLE]
Finally, the last term in (A.8) is bounded by \sum_{j=2}^{q}\big{\|}\bm{A}_{j}(q)\big{\|}\leavevmode\nobreak\ \big{\|}\sum_{t=1}^{T}\bm{G}_{T}\Delta\bm{Z}_{t-j+1}\eta_{it}\big{\|}. Using similar arguments above, we conclude that (lags of will lead to \mathbb{E}\big{(}v_{kt-j}\eta_{it}\big{)}=0 for any ). Hence, . Combining (A.8), (A.9) and (A.10), we have
[TABLE]
Note that \operatorname{vec}\Big{(}\bm{\varSigma}_{\eta\eta}^{-1}\Big{)}=\left[\begin{smallmatrix}\operatorname{col}_{1}(\bm{\varSigma}_{\eta\eta}^{-1})\\ \vdots\\ \operatorname{col}_{n}(\bm{\varSigma}_{\eta\eta}^{-1})\end{smallmatrix}\right]. Inserting (A.11) into (A.7), we eventually have
[TABLE]
where the symmetry property of is used in the final step.
Now we consider the term in (A.5). If we define \bm{u}_{t}^{*}=\big{[}u_{1t}^{*},\dots,u_{nt}^{*}\big{]}:=\bm{\mathcal{A}}_{q}(L)\bm{u}_{t}-\bm{\eta}_{t}=-\sum_{j=1}^{\infty}\big{(}\bm{A}_{j}(q)-\bm{A}_{j}\big{)}\bm{u}_{t-j}, where for , and then apply (A.6), we have \@slowromancap ii@=\Big{[}\sum_{t=1}^{T}\big{(}\bm{\mathcal{A}}_{q}(L)\bm{Z}_{t}^{\prime}\bm{G}_{T}\big{)}^{\prime}u_{1t}^{*},\dots,\sum_{t=1}^{T}\big{(}\bm{\mathcal{A}}_{q}(L)\bm{Z}_{t}^{\prime}\bm{G}_{T}\big{)}^{\prime}u_{nt}^{*}\Big{]}\operatorname{vec}\Big{(}\bm{\varSigma}_{\eta\eta}^{-1}+o(1)\Big{)}. For any block , by the BN decomposition (A.2), we have
[TABLE]
It implies . The theorem now follows from (A.1), (A.4), and (A.12).
- Proof of Theorem 2
We start wih the estimation error . Repeated addition and subtraction yields
[TABLE]
We will only consider the terms \big{\|}\widehat{\bm{\mathcal{M}}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}(q)\big{\|} and \big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\big{\|}. It is not hard to derive that the remaining terms are bounded in probability. Define and denote its () subblocks by , . This matrix is banded in such a way that there are at most nonzero block in the block-columns of . Using this observation and various norm properties, we find
[TABLE]
where the final step follows from Lemma S3.131313More specifically, for any matrix we have . Moreover, if is an matrix, then also . We conclude that \big{\|}\widehat{\bm{\mathcal{M}}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}(q)\big{\|}=O_{p}\left(\sqrt{q^{3}/T}\right). The difference forms a symmetric and block diagonal matrix, hence \big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}(q)\big{\|}=\max\left\{\big{\|}\widehat{\bm{S}}(0)-\bm{S}(0)\big{\|},\max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{S}}(\ell)-\bm{S}(\ell)\big{\|}\right\}. By Assumption 2,
[TABLE]
Applying Lemma S3, we see \big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\big{\|}\leq\big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}(q)\big{\|}\leavevmode\nobreak\ \big{\|}\widehat{\bm{\mathcal{S}}}_{\bm{u}}^{-1}(q)\big{\|}\leavevmode\nobreak\ \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\big{\|}=O_{p}\left(q/\sqrt{T}\right). Overall, recalling (A.13), a bound on the estimation eror is .
The bound on the truncation error, , follows from a straightforward generalization of the results in Lemma 2 of Cheng et al. (2015) and Propositions 2.1-2.2 of Ing et al. (2016b). See Lemma S4 in the supplementary material for details. ∎
- Proof of Theorem 3
(a)-(b) See the proof of Proposition 1 in Wagner et al. (2020). (c) Since the residuals are obtained by first stage OLS, we get \|\widehat{\bm{u}}-\bm{u}\|^{2}\leq\|\bm{G}_{T}^{-1}\big{(}\widehat{\bm{\beta}}_{OLS}-\bm{\beta}\big{)}\|^{2}\leavevmode\nobreak\ \|\bm{G}_{T}\bm{Z}^{\prime}\bm{Z}\bm{G}_{T}\|=O_{p}(1). Assumption 2 is thus satisfied and we can rely on the results in Theorem 2. From (3.12), the definition of the FM-GLS estimator, we have
[TABLE]
Given Theorem 2, we have and it converges weakly to the expression in (A.4).
To continue, we define and its BN decomposition through with . and \sum_{j=1}^{q}\big{\|}\bm{A}_{j}^{*}(q)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}\big{\|}\widetilde{\bm{A}}_{j}(q)\big{\|}_{\mathcal{F}}+o_{p}(q) are obtained from the following two results: (1) \big{\|}\widehat{\bm{\mathcal{A}}}_{q}(1)-\bm{\mathcal{A}}_{q}(1)\big{\|}_{\mathcal{F}}\leq\sum_{j=1}^{q}\big{\|}\widehat{\bm{A}}_{j}(q)-\bm{A}_{j}(q)\big{\|}_{\mathcal{F}}\leq C\sqrt{q}\big{\|}\widehat{\bm{A}}(q)-\bm{A}(q)\big{\|}=O_{p}\Big{(}\frac{q^{3/2}}{T^{1/2}}\Big{)}=o_{p}(1), where the last step follows from Lemma S3, and (2) \sum_{j=1}^{q}\big{\|}\bm{A}_{j}^{*}(q)-\widetilde{\bm{A}}_{j}(q)\big{\|}_{\mathcal{F}}\leq q\sum_{j=1}^{q}\big{\|}\widehat{\bm{A}}_{j}(q)-\bm{A}_{j}(q)\big{\|}_{\mathcal{F}}=o_{p}(q). Using the BN decomposition of and similar steps as those below (A.5), we have
[TABLE]
where given in Lemma S3. Using the identity (A.6), it is not hard to deduce
[TABLE]
Combining the results above leads to:
[TABLE]
where . By construction, we have . Altogether this implies the limiting distribution in the theorem. ∎
- Proof of Theorem 4
We first introduce the appropriate scaling into the test statistic, that is
[TABLE]
Since the matrices and commute and , we have under the null hypothesis. Conditional on \mathcal{F}_{v}=\sigma\big{(}\bm{B}_{v}(r),0\leq r\leq 1\big{)}, this quantity is asymptotically normally distributed by Theorem 3(c) with asymptotic covariance matrix
[TABLE]
The consistent estimation of all the quantities involved ensures that has the same limit. Therefore, the Wald statistics is conditionally chi-square distributed with degrees of freedom. Since this distribution does not depend on , we conclude that the unconditional distribution of is also . ∎
- Proof of Theorem 5
The results for and follow from a straightforward multivariate extensions of the proof of Proposition 6 in Wagner and Hong (2016). For , we first define the population counterparts of and . That is, let \bm{\varphi}_{j,b_{T}}:=\big{[}\bm{u}_{j}^{\prime},\sum_{s=j}^{j+1}\bm{u}_{s}^{\prime},\dots,\sum_{s=j}^{j+b_{T}-1}\bm{u}_{s}^{\prime}\big{]}^{\prime} and let denote the subblock matrix of formed by taking the elements with row and column indices belonging to the set . By rearrangement, we have
[TABLE]
where the remainder term is bounded as |R(q_{T},j,b_{T})|\leq\big{\|}b_{T}^{-1}\big{(}\bm{\varphi}_{j,b_{T}}(\{\hat{\bm{u}}_{FGLS}\})-\bm{\varphi}_{j,b_{T}}\big{)}\big{\|}^{2}\leavevmode\nobreak\ \big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})\big{\|}+2\big{\|}b_{T}^{-1}\big{(}\bm{\varphi}_{j,b_{T}}(\{\hat{\bm{u}}_{FGLS}\})-\bm{\varphi}_{j,b_{T}}\big{)}\big{\|}\leavevmode\nobreak\ \big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})\big{\|}\leavevmode\nobreak\ \big{\|}b_{T}^{-1}\bm{\varphi}_{j,b_{T}}\big{\|}. Poincaré’s separation theorem (e.g. page 347-348 of Abadir and Magnus (2005)) implies when is a principal submatrix of . By this inequality and Theorem 2, we conclude that \big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})-\bm{\varSigma}_{\bm{u}}^{-1}(q_{T},b_{T})\big{\|}\leq\big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T})-\bm{\varSigma}_{\bm{u}}^{-1}(q_{T})\big{\|}=o_{p}(1) and \big{\|}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})\big{\|}=O_{p}(1). Moreover,
[TABLE]
where \left\|b_{T}^{-1/2}\sum_{s=j}^{t}\left(\widehat{\bm{u}}_{s,FGLS}-\bm{u}_{s}\right)\right\|\leq\left\|b_{T}^{-1/2}\sum_{s=j}^{t}\bm{Z}_{s}^{\prime}\bm{G}_{T}\right\|\leavevmode\nobreak\ \big{\|}\bm{G}_{T}^{-1}\big{(}\widehat{\bm{\beta}}_{FGLS}^{+}-\bm{\beta}\big{)}\big{\|}=o_{p}(1) by Theorem 3 and the assumption as . Standard weak convergence arguments imply \big{\|}b_{T}^{-1}\bm{\varphi}_{j,b_{T}}\big{\|}=O_{p}(1). Combining these results, we have b_{T}^{-2}\bm{\varphi}_{j,b_{T}}^{\prime}\Big{(}\widehat{\bm{\varSigma}_{\bm{u}}^{-1}}(q_{T},b_{T})-\bm{\varSigma}_{\bm{u}}^{-1}(q_{T},b_{T})\Big{)}\bm{\varphi}_{j,b_{T}}=o_{p}(1) and . By (A.14), it remains to consider . Construct the selection matrix such that
[TABLE]
Then, by the MCD (2.6), we have
[TABLE]
As argued in the proof of Theorem 1, by the assumption as , we can treat the premultiplication of by as applying the filter block-wise. Under the same condition, implies a scaling . By the BN decomposition in Lemma 1, and similarly with ,
[TABLE]
For , a FCLT for i.i.d. sequences gives
[TABLE]
The partial sum process thus dominates the asymptotic distribution:
[TABLE]
An application of the continuous mapping theorem completes the proof. ∎
Supplemental Appendix to:
Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve
Yicong Lin, and Hanno Reuvers
S1 Additional Proofs
Lemma S1
If satisfies Assumption 1, then for any , there exists a constant such that
[TABLE]
- Proof
In view of page 257 of Hannan and Deistler (2012), the summability condition of Assumption 1 implies that the spectral density matrix is bounded and bounded away from zero. The boundedness condition in Cheng and Pourahmadi (1993) is thus satisfied and (S1.1) follows from their Theorem 2.2. ∎
Lemma S2 (Implications of the First Moment Bound Theorem)
Let Assumption 1 hold, and define
[TABLE]
The following three inequalities are true:
- (a)
\mathbb{E}\Big{\|}\frac{1}{T-q}\sum_{t=q}^{T-1}\bm{u}_{t}(q)\bm{u}_{t}(q)^{\prime}-\mathbb{E}\left(\bm{u}_{t}(q)\bm{u}_{t}(q)^{\prime}\right)\Big{\|}^{2}\leq C\frac{q^{2}}{T-q}; 2. (b)
\mathbb{E}\left\|\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\big{(}\bm{\eta}_{t+1,\ell}-\bm{\eta}_{t+1}\big{)}\bm{u}_{t}(\ell)^{\prime}\right\|^{r}\leq C\left(\frac{\ell}{T-\ell}\right)^{r/2}\Big{(}\sum_{j=\ell+1}^{\infty}\left\|\bm{A}_{j}\right\|_{\mathcal{F}}^{2}\Big{)}^{r/2}, for some and any ; 3. (c)
\mathbb{E}\left\|\frac{1}{T-\ell}\sum_{t=\ell}^{T-1}\Big{(}\bm{\eta}_{t+1,\ell}\bm{\eta}_{t+1,\ell}^{\prime}\Big{)}-\mathbb{E}\Big{(}\bm{\eta}_{t+1,\ell}\bm{\eta}_{t+1,\ell}^{\prime}\Big{)}\right\|^{r}\leq C(T-\ell)^{-r/2}, for some and any .
- Proof
(a) Since , we obtain
[TABLE]
where denotes the element . As remarked in the main text, the lag polynomial is invertible by Assumption 1. Recall with , and . We observe that u_{i,t}=\sum_{j=0}^{\infty}\operatorname{row}_{i}\big{(}\bm{C}_{j}\big{)}\bm{\eta}_{t-j}. By Proposition 10.2(b) of Hamilton (1994), absolute summability of the coefficient matrices implies where . The conditions for the First Moment Bound Theorem (FMBT) in Findley and Wei (1993) are thus satisfied. Choosing if (the banding parameter) and otherwise,
[TABLE]
by the FMBT. This bound holds for general , , and , and (a) thereby follows from (S1.3).
(b) For and , we have
[TABLE]
by and the -inequality. By assumption, is uncorrelated with \big{[}\bm{u}_{t-\ell+1}^{\prime},\ldots,\bm{u}_{t}^{\prime}\big{]}^{\prime} implying that \mathbb{E}\big{[}\sum_{t=\ell}^{T-1}(\bm{\eta}_{t+1,\ell}-\bm{\eta}_{t+1})\bm{u}_{t-s}^{\prime}\big{]}=\mathbf{O}. The FMBT can thus be applied directly without having to express the quadratic form in deviations from the mean. However, some rewriting is needed to obtain expressions in scalar random sequences. To this end, use and to denot the element of and , respectively. Setting for , we have \bm{\eta}_{t+1,\ell}-\bm{\eta}_{t+1}=\sum_{j=1}^{\infty}\big{[}\bm{A}_{j}-\bm{A}_{j}(\ell)\big{]}\bm{u}_{t+1-j}, and hence
[TABLE]
with u_{t}^{*}=\sum_{j=1}^{\infty}\sum_{l=1}^{n}\big{(}A_{j,kl}-A_{j,kl}(\ell)\big{)}u_{l,t+1-j}, where we suppress the dependence on the index (also below) without confusion. To apply the FMBT, we define the autocovariances \gamma_{u^{*}}(t-h)=\mathbb{E}\big{(}u_{t}^{*}u_{h}^{*}\big{)}, the difference in lag polynomial coefficients and \bm{\varSigma}_{u_{l},\infty}=\big{[}\gamma_{u,l}(i-j),1\leq i,j<\infty\big{]}. By the Cauchy-Schwartz inequality, the -inequality, and boundedness of the maximum eigenvalue of , we obtain
[TABLE]
Applying the FMBT, we have
[TABLE]
using (S1.6) and the absolute summability of . Combining (S1.4), (S1.5) and (S1.7) leads to the desired inequality.
(c) The equality \bm{\eta}_{t+1,\ell}=\Big{(}\bm{I}_{n}-\sum_{j=1}^{\ell}\bm{A}_{j}(\ell)L^{j}\Big{)}\bm{\mathcal{C}}(L)\bm{\eta}_{t+1} shows that has a linear process representation in terms of . Theorem 6.6.12 of Hannan and Deistler (2012) implies that . By Propositions 10.2(b) and 10.3 of Hamilton (1994), both the coefficient matrices associated with \Big{(}\bm{I}_{n}-\sum_{j=1}^{\ell}\bm{A}_{j}(\ell)L^{j}\Big{)}\bm{\mathcal{C}}(L) and the autocovariances are absolutely summable, where is the entry of . The proof is completed using the -inequality and the FMBT, that is, for ,
[TABLE]
∎
Lemma S3
[TABLE]
- Proof
Recall the definition of and in (S1.2). Similarly, define
[TABLE]
We first prove \max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{A}}(\ell)-\bm{A}(\ell)\big{\|}=O_{p}\left(q/\sqrt{T}\,\right). Since and (the first-order condition from (3.4)), we have
[TABLE]
If we can show that is asymptotically invertible, then must also be asymptotically invertible with probability 1, for any .141414If the matrix is invertible, then each leading principle submatrix of is invertible as well. By the triangular inequality, \left\|\frac{1}{T-q}\sum_{t=q}^{T-1}\widehat{\bm{u}}_{t}(q)\widehat{\bm{u}}_{t}(q)^{\prime}-\mathbb{E}\big{(}\bm{u}_{t}(q)\bm{u}_{t}(q)^{\prime}\big{)}\right\|\leq\@slowromancap i@_{a}+\@slowromancap i@_{b}, where
[TABLE]
by Chebyshev’s inequality and Lemma S2 , and
[TABLE]
since . We have by Assumption 2. Because by Markov’s inequality, we conclude . Overall, this gives
[TABLE]
Now observe that \mathbb{E}\big{(}\bm{u}_{t}(q)\bm{u}_{t}(q)^{\prime}\big{)} is a leading principal submatrix of (thus invertible, see footnote 14). As a result, is asymptotically invertible.
We subsequently bound the RHS of (S1.10) as follows: , where , , and
[TABLE]
We consider these terms separately starting from . Using the properties of Frobenius norm,
[TABLE]
Assumption 1 justifies the use of Lemma 2 in Wei (1987) which gives . By Chebyshev’s inequality, , there exists such that
[TABLE]
and thus . Furthermore, we deduce that by Lemma S2 and Chebyshev’s inequality. For , if we write \widehat{\bm{\eta}}_{t+1,\ell}-\bm{\eta}_{t+1,\ell}=\big{[}\bm{I}_{n},-\bm{A}(\ell)\big{]}\big{[}\widehat{\bm{u}}_{t+1}(\ell+1)-\bm{u}_{t+1}(\ell+1)\big{]}, then by Cauchy-Schwarz inequality and Baxter’s inequality (leads to ),
[TABLE]
where the last step follows from arguments similar to those preceding (S1.11). Similarly, and . Combining all results, we finally have
[TABLE]
By invertibility of , (S1.10) and (S1.12), \max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{A}}(\ell)-\bm{A}(\ell)\big{\|}=O_{p}\left(q/\sqrt{T}\,\right) follows.
We continue with \max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{S}}(\ell)-\bm{S}(\ell)\big{\|}=O_{p}\left(q/\sqrt{T}\,\right). Since (see (S1.9)), we can use (S1.10) and the invertibility of to write
[TABLE]
where the last step in (S1.13) follows from using Lemma S2 and (S1.12). By the inequality and similar arguments as for and above, the first term in (S1.13) is bounded by
[TABLE]
Overall, we obtain \max_{1\leq\ell\leq q}\big{\|}\widehat{\bm{S}}(\ell)-\bm{S}(\ell)\big{\|}=O_{p}\left(q/\sqrt{T}\,\right) as well. ∎
Lemma S4
Under Assumptions 1 and 3, we have
[TABLE]
- Proof
Consider . A rewriting as in (A.13) shows that \big{\|}\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\big{\|} and \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}\big{\|} are the two important terms to bound. Hölder’s inequality implies
[TABLE]
For the matrix -norm we are concerned with the maximum absolute column sum. For an arbitrary matrix partitioned (block) column-wise, i.e. , we have the bound . This implies \big{\|}\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\big{\|}_{1}\leq\sqrt{n}\max\left\{\@slowromancap i@_{a},\@slowromancap i@_{b}\right\} where
[TABLE]
We will bound the three summations that are encountered in the expressions for and . First, changing the summation index and using -inequality,
[TABLE]
For convenience, we define
[TABLE]
For any and , we have by the -Baxter’s inequality. The first term in the RHS above is thus bounded by . Moreover, by Cauchy-Schwartz inequality, the second term can be bounded as \sum_{s=q+1}^{\infty}\left\|\bm{A}_{s}\right\|_{\mathcal{F}}^{2}=\sum_{s=q+1}^{\infty}\big{(}s^{2}\left\|\bm{A}_{s}\right\|_{\mathcal{F}}^{2}\big{)}s^{-2}\leq\left(\sum_{s=q+1}^{\infty}s^{2}\left\|\bm{A}_{s}\right\|_{\mathcal{F}}^{2}\right)\left(\sum_{s=q+1}^{\infty}s^{-2}\right)\leq\mathcal{K}_{q}. Now the second summation in . We first consider the case , or , such that
[TABLE]
using arguments detailed before. This upper bound remains valid for . It is likewise straightforward to derive . Collecting all the results, we have , , and thus \big{\|}\bm{F}_{\bm{u}}(q)-\bm{F}_{\bm{u}}\big{\|}_{1}\leq C\sqrt{\mathcal{K}_{q}}.
For , we are bounding the maximum absolute row sums. For an arbitrary matrix partitioned as , we have , such that
[TABLE]
where and , for any , using the -Baxter’s inequality and the previous upper bound on . We conclude that . Together with our previous we result, we obtain \big{\|}\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\big{\|}\leq C\sqrt{\mathcal{K}_{q}} from (S1.15).
From \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}\big{\|}\leq\big{\|}\bm{\mathcal{S}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}\big{\|}\leavevmode\nobreak\ \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}\big{\|}\leavevmode\nobreak\ \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)\big{\|} we see that it suffices to inspect \big{\|}\bm{\mathcal{S}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}\big{\|} (the other norms are bounded). Exploiting the fact that both and are block-diagonal, we have \big{\|}\bm{\mathcal{S}}_{\bm{u}}(q)-\bm{\mathcal{S}}_{\bm{u}}\big{\|}=\max_{q+1\leq k\leq T-1}\left\|\bm{S}(q)-\bm{S}(k)\right\|\leq 2\max_{q\leq k\leq T-1}\left\|\bm{S}(k)-\bm{\varSigma}_{\eta\eta}\right\|. Let for , and recall the definition of in (S1.2). We find, for any ,
[TABLE]
We thereby obtain \big{\|}\bm{\mathcal{S}}_{\bm{u}}^{-1}(q)-\bm{\mathcal{S}}_{\bm{u}}^{-1}\big{\|}\leq C\mathcal{K}_{q}. Together with the bound on \big{\|}\bm{\mathcal{M}}_{\bm{u}}(q)-\bm{\mathcal{M}}_{\bm{u}}\big{\|}, we deduce
[TABLE]
The proof is complete. ∎
Theorem S1
Let denote an -dimensional standard Brownian motion. The cumulative density function (CDF) of is given by
[TABLE]
where , and .
- Proof
We follow the approach from Example 1 of Anderson and Darling (1952) or equivalently appendix B of Choi and Saikkonen (2010). Let denote the probability density function of and write and for the Laplace and inverse Laplace operator, respectively. From the equality , independence of the components of , and the known univariate result in Choi and Saikkonen (2010), we have
[TABLE]
According to equation (4.28) in Anderson and Darling (1952), the CDF is
[TABLE]
where we use (1) a with a positive real part, (2) linearity of the inverse Laplace operator, and (3) the binomial expansion of . The identity from Choi and Saikkonen (2010), , completes the proof. ∎
S2 Estimation of Quantities for Fully Modified Inference
The FM-GLS estimator relies on , , and (see Assumption 1). For convenience, we denote this matrix by . Please note the difference between and the large-dimensional matrix . In this section, we consider the estimation of these three quantities within the BIAM framework. For conenience, we recall and define . Similarly to the definition of , we used to denote the autocovariance matrix of . As a sample counterpart, we stack and in the -dimensional vector \widehat{\bm{\xi}}_{t}=\big{[}\widehat{\bm{u}}_{t}^{\prime},\bm{v}_{t}^{\prime}\big{]}^{\prime}. Using , the BIAM estimator for is now constructed as , where the matrices and are defined similarly to and , respectively. Since the BIAM estimator is fitting VAR processes up to order (see (3.4)), the coefficient estimates of the lag when a VAR is fitted, , are immediate byproducts of the BIAM procedure and can thus be used to construct our estimators. Finally, if , where \bm{F}_{j}=\operatorname{diag}\big{[}\bm{A}_{j},\bm{D}_{j}\big{]}, then holds.
Theorem S2
Recall the definitions , , and
[TABLE]
where is an block matrices of zeros of which the last blocks have been replaced by identity matrices. If Assumptions 1-3 and 5 hold, then
[TABLE]
- Proof
Note that and hence by Assumption 2. The conditions for Lemmas S3 – S4 and Theorem 2 are thus satisfied and we can use these results in subsequent proofs. (a) The result (S2.1) follows from the triangle inequality, Lemma S3 and (S1.17). (b) The second result (S2.2) is obtained by the definition , Lemma S3 and a straightforward modification of (A.13). (c) By \bm{\varDelta}=\sum_{h=r_{T}}^{\infty}\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}+\bm{Q}_{r_{T}}^{\prime}\bm{\varSigma}_{\bm{\xi}}\bm{Q}_{1}, the LHS of (S2.3) can be bounded
[TABLE]
Since summability conditions on the coefficient matrices carry over to the autocovariances, we have \sum_{h=r_{T}}^{\infty}\left\|\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}\right\|_{\mathcal{F}}\leq r_{T}^{-1}\sum_{h=r_{T}}^{\infty}h\left\|\mathbb{E}\big{(}\bm{\xi}_{t}\bm{\xi}_{t+h}^{\prime}\big{)}\right\|_{\mathcal{F}}=o(r_{T}^{-1}) by Assumption 1. Moreover, and \big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}^{-1}}(q_{T})-\bm{\varSigma}_{\bm{\xi}}^{-1}\big{\|} is discussed in Theorem 2. Finally, showing \big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\bm{Q}_{1}\big{\|}=O_{p}(1) will complete the proof after a straightforward comparison of the established stochastic orders. It suffices to prove \big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\big{\|}=O_{p}(1). Weyl’s inequality (e.g. pages 40 and 46 in Tao (2012)) and Theorem 2 imply
[TABLE]
By the uniform boundedness of \big{\|}\bm{\varSigma}_{\bm{\xi}}\big{\|}, for a sufficiently large , there exists a constant such that \big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\big{\|}^{-1}=\lambda_{min}\Big{(}\widehat{\bm{\varSigma}_{\bm{\xi}}^{-1}}(q_{T})\Big{)}\leq C and thus \big{\|}\widehat{\bm{\varSigma}_{\bm{\xi}}}(q_{T})\big{\|}\leq C^{-1} with arbitrarily high probability. ∎
S3 Additional information for the Empirical Application
S3.1 Model fit
S3.2 Simulation DGP
The following procedure was used to obtain a simulation DGP that closely mimics the data characteristics.
- (a)
Fit VAR() models () to the series and individually. The BIC criterion select the VAR(1) specification for both series (Table 7). Store the coefficient matrices and as well as the residual series and , respectively. 2. (b)
Stack and compute . 3. (c)
Denoting the estimated coefficients from the data by , we generate the new data according to the following equations:
[TABLE]
S4 Details on Implementation
The implementation of the BIAM estimator and the subsampling KPSS tests requires selecting the banding parameter and the block length . In our simulations and empirical application, we follow the subsampling and risk-minimization approach previously used by Bickel and Levina (2008), Wu and Pourahmadi (2009) and Ing et al. (2016b) to select . The steps are as follows:
- Step 1
Split the series of (first-step OLS) residuals, , into non-overlapping subsequences of length . These subsequences are for with . 2. Step 2
Select an integer , , and construct the sample autocovariance matrix which is an estimator of with , where \widehat{\bm{u}}_{t}(\ell)=\big{(}\widehat{\bm{u}}_{t}^{\prime},\cdots,\widehat{\bm{u}}_{t-\ell+1}^{\prime}\big{)}^{\prime}. Compute . 3. Step 3
For every subsequence of residuals , compute the BIAM estimate of repeatedly for all possible banding parameters , denoted as . 4. Step 4
Select the banding parameter that minimizes the feasible average risk, i.e.
[TABLE]
We take , and and obtain satisfactory results for all the settings we have explored. As mentioned in Bickel and Levina (2008), the use of another vector norm (e.g. ) does not lead to qualitatively different results.
When we implement the minimum volatility rule as mentioned in Section 3 to select , the values of tuning parameters are adopted from Wagner and Hong (2016), see their online supplementary material.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abadir and Magnus (2005) Abadir, K. M. and J. R. Magnus (2005). Matrix Algebra . Cambridge University Press.
- 2Anderson and Darling (1952) Anderson, T. W. and D. A. Darling (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. The Annals of Mathematical Statistics 23 , 193–212.
- 3Andrews (1991) Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59 , 817–858.
- 4Berk (1974) Berk, K. N. (1974). Consistent autoregressive spectral estimates. The Annals of Statistics 2 , 489–502.
- 5Beutner et al. (2019) Beutner, E., Y. Lin, and S. Smeekes (2019). GLS estimation and confidence sets for the date of a single break in models with trends. Working Paper.
- 6Bickel and Levina (2008) Bickel, P. J. and E. Levina (2008). Regularized estimation of large covariance matrices. The Annals of Statistics 36 , 199–227.
- 7Boden et al. (2017) Boden, T., G. Marland, and R. Andres (2017). Global, regional, and national fossil-fuel CO 2 subscript CO 2 \text{CO}_{2} emissions. Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, Tenn., U.S.A. http://cdiac.ess-dive.lbl.gov/trends/emis/tre_coun.html .
- 8Bolt et al. (2018) Bolt, J., R. Inklaar, H. de Jong, and J. L. van Zanden (2018). Rebasing ”maddison”: New income comparisons and the shape of long-run economic development. https://www.rug.nl/ggdc/historicaldevelopment/maddison/releases/maddison-project-database-2018 .
