Asymptotic normality of the time-domain generalized least squares estimator for linear regression models
Hien D Nguyen

TL;DR
This paper proves that the time-domain generalized least squares estimator is asymptotically normal for a broad class of error dependence models in linear regression, extending previous results limited to specific structures.
Contribution
It establishes the asymptotic normality of the time-domain IGLS estimator under general dependence models, broadening the scope beyond prior frequency domain results.
Findings
Asymptotic normality of time-domain IGLS estimator proven for general dependence models
Extends previous results limited to specific error structures
Provides theoretical foundation for inference with misspecified GLS in time domain
Abstract
In linear models, the generalized least squares (GLS) estimator is applicable when the structure of the error dependence is known. When it is unknown, such structure must be approximated and estimated in a manner that may lead to misspecification. The large-sample analysis of incorrectly-specified GLS (IGLS) estimators requires careful asymptotic manipulations. When performing estimation in the frequency domain, the asymptotic normality of the IGLS estimator, under the so-called Grenander assumptions, has been proved for a broad class of error dependence models. Under the same assumptions, asymptotic normality results for the time-domain IGLS estimator are only available for a limited class of error structures. We prove that the time-domain IGLS estimator is asymptotically normal for a general class of dependence models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical and numerical algorithms · Advanced Statistical Methods and Models
Asymptotic normality of the time-domain generalized least squares
estimator for linear regression models
Hien D. Nguyen Corresponding author email: [email protected]. 1Department of Mathematics and Statistics, La Trobe University, Bundoora Melbourne 3086, Victoria Australia.
Abstract
In linear models, the generalized least squares (GLS) estimator is applicable when the structure of the error dependence is known. When it is unknown, such structure must be approximated and estimated in a manner that may lead to misspecification. The large-sample analysis of incorrectly-specified GLS (IGLS) estimators requires careful asymptotic manipulations. When performing estimation in the frequency domain, the asymptotic normality of the IGLS estimator, under the so-called Grenander assumptions, has been proved for a broad class of error dependence models. Under the same assumptions, asymptotic normality results for the time-domain IGLS estimator are only available for a limited class of error structures. We prove that the time-domain IGLS estimator is asymptotically normal for a general class of dependence models.
Keywords: asymptotic normality; autoregressive models; generalized least squares; misspecification; time-series analysis
1 Introduction
Let be a non-stochastic sequence of vectors, such that , where , , and is the matrix transposition operator. Let be a sequence of random errors. We are interested in the sequence that is generated by the relationship
[TABLE]
where is a vector of non-stochastic regression coefficients. We can write relationship (1) in the matrix form:
[TABLE]
simultaneously for all , where has rows , , and .
Relationships that are described by (1) are generally referred to as multiple linear regression models and are ubiquitous in the study of engineering, natural science, and social science phenomena (see e.g., Weisberg,, 2005). For general treatments of the topic of linear regression modeling, we refer the interested reader to the the manuscripts of Gross, (2003), Seber & Lee, (2003), and Yan & Su, (2009).
In this article, we consider the scenario where the sequence of errors is a finite segment of the stationary sequence , such that
[TABLE]
for each , where is an independent sequence of random variables, such that and . If the sequence of coefficients is known, then one can write the covariance matrix of : and use it to construct, the so-called generalized least squares (GLS) estimator
[TABLE]
which is known to be the best linear unbiased estimator (BLUE) of (cf. Amemiya,, 1985, Sec. 6.1.3). Furthermore, Theorem 1 of Baltagi, (2002, Ch. 9) states that under the condition that exists, we also have the fact that is asymptotically normal in the sense that
[TABLE]
where we denote convergence in law by and denotes a normal distribution with mean vector and covariance matrix .
In applications, the coefficients are rarely if ever known. Thus, the DGP of the error sequence is also unknown. In order to proceed to make inference, one generally assumes a hypothetical DGP for that is equivalent to that of some error sequence .
Let be a finite segment of , and let . Furthermore, write the covariance matrix as . Then, by replacing in (2) by , we obtain the so-called incorrectly-specified GLS (IGLS; Koreisha & Fang,, 2001) estimator
[TABLE]
The finite-sample properties of (3) were studied comprehensively in Koreisha & Fang, (2001) and Kariya & Kurata, (2004). General asymptotic results for the IGLS estimator are more difficult to establish, and thus only a small number of results are available in the literature. For example, Rothenberg, (1984) considered asymptotic normality of the IGLS estimator, when has first-order autoregressive (AR) form. Some consistency results regarding the IGLS estimator appear in Samarov, (1987) and Koreisha & Fang, (2001). To date, the most general set of asymptotic theorems regarding the IGLS estimator are those reported in Amemiya, (1973).
Make the so-called Grenander regularity conditions (Grenander,, 1954):
- [Gren1]
(), where ,
- [Gren2]
(),
- [Gren3]
exists for each , and
- [Gren4]
the matrix is non-singular, where has element in the row and column.
Furthermore, make the following additional assumptions.
- [Amem1]
The elements of have the form
[TABLE]
where is a sequence of independent random variables such that and , and is such that and , where and .
- [Amem2]
The sequence is hypothesized to be equivalent to the stationary autoregressive (AR) process , with term of the form
[TABLE]
where is a sequence of independent random variables such that and , and are such that the roots of (with respect to ) are all outside of the unit circle.
Here denotes the imaginary unit. See Amemiya, (1985, Ch. 5.2) for more details regarding AR processes.
Let be the diagonal matrix with element , for . Further, let and define to be a Hermitian matrix function with positive semidefinite increments such that . Under [Gren1]–[Gren4], [Amem1], and [Amem2], Amemiya, (1973) proved that
[TABLE]
where
[TABLE]
Furthermore, Amemiya, (1973) showed that if we denote the spectral density functions (SDFs) of the processes and by and , respectively, then we may write in the spectral form
[TABLE]
It is remarkable that the IGLS is a time-domain estimator that can be proved to have a covariance matrix with simple spectral form.
Since the SDF of can be written as
[TABLE]
where and for each , we can write
[TABLE]
Thus, since the auto-covariance is determined by (5), we may interchange the notation with . We shall refer to as the time-domain IGLS estimator in order to differentiate it from the frequency-domain IGLS estimator that is introduced in the sequel.
Let
[TABLE]
and
[TABLE]
to be the periodogram of and the cross-spectra periodogram between and , respectively (see, e.g., Brockwell & Davis,, 2006, Sec. 11.7). Using (7) and (8), Hannan, (1973) proposed to estimate by the frequency-domain IGLS estimator
[TABLE]
Let is the that is generated by the sequence and make the following assumptions.
- [Hann1]
The random sequence has the form , satisfying
[TABLE]
almost surely, and .
- [Hann2]
The random sequence has distribution , which satisfies
[TABLE]
- [Hann3]
The SDF is real, positive, continuous, and even over .
Under [Gren1]–[Gren4] and [Hann1]–[Hann3], Hannan, (1973) proved that
[TABLE]
where has form (4) (see also Robinson & Velasco,, 1997). That is, (3) and (9) have the same asymptotic distributions when both [Amem1] and [Amem2], or [Hann1]–[Hann3] are satisfied, in addition to [Gren1]–[Gren4].
It is notable, however, that [Hann1]–[Hann3] are more general assumptions that [Amem1] and [Amem2]. Thus, an obvious question to ask is whether the equivalence in asymptotic distributions between the time-domain estimator (3) and frequency-domain estimator (9) remains when one replaces [Amem1] and [Amem2] by assumptions that are more general and closer in spirit to [Hann1]–[Hann3]. In this article, we shall provide an affirmative answer to this question. Before presenting our main result, we wish to provide a review of the relevant literature.
The asymptotic covariance form (9) was used by Engle, 1974b and Nicholls & Pagan, (1977) to explore the efficiencies of the OLS estimator, the BLUE, and the IGLS, under various choices of and , when . Some finite sample properties of the frequency-domain estimator were established in Engle, 1974a and Engle & Gardner, (1976).
Under [Gren1]–[Gren4], the IGLS asymptotic covariance (4) was obtained via spectral methods in Kholevo, (1969), Rozanov & Kozlov, (1969), Kholevo, 1971a , and Ibragimov & Rozanov, (1978, Sec. 7.4), under general conditions (see Lemma 5 in the Appendix). In the cited papers, the IGLS was studied under the name of pseudo-best estimators. Unfortunately, no asymptotic normality result of the desired kind were established. It is notable, that Kholevo, 1971b obtained an asymptotic normality result for the continuous-time least squares problem that is hypothesized to be transferable to the pseudo-best estimator case. However, no such result was provided, nor a result regarding the discrete time case.
Hybrid time and frequency-domain IGLS estimators have also been considered, as well as extensions upon the frequency-domain estimator theme. Examples of hybrid estimators include Samarov, (1987) and Hambaba, (1992).
Extensions of the results of Hannan, (1973) to account for long-range dependence appear in Robinson & Hidalgo, (1997) and Hidalgo & Robinson, (2002). A non-linear frequency-domain estimator appears in Hannan, (1971). A broad generalization of the frequency-domain estimation approach to semi-parametric and non-parametric modeling is considered by Robinson, (1991).
Closely related to our article is the report of Aguero et al., (2010), which establishes the asymptotic equivalence between time and frequency-domain estimators for linear dynamic system identification problems. See Hannan & Deistler, (2012) regarding linear dynamic systems.
Using the Cholesky covariance matrix factorization method of Wu & Pourahmadi, (2003), Yang, (2012) constructed an IGLS estimator that is asymptotically efficient. Furthermore, they obtain an asymptotic normality result, under the [Gren1]–[Gren4], using a proof technique that is adapted from those of Anderson, (1971, Thm. 10.2.7) and Fuller, (1996, Thm. 9.1.2) (see Lemma 2 in the Appendix). A model averaging method akin to the construction of Yang, (2012) was studied in Cheng et al., (2015), and a long memory GLS estimator of the same form was considered by Ing et al., (2016).
Also related to our article is the work of Kapetanios & Psaradakis, (2016), which proposed to extend the results of Amemiya, (1973) in a different direction. Here, the [Gren1]–[Gren4] are replaced by various stochastic assumptions on the sequences and that make use of mixing and stochastic approximation concepts, and higher moment bound (see Potscher & Prucha, 1997, Ch. 6 regarding mixing and approximation concepts). Compared to our work, the work of Kapetanios & Psaradakis, (2016) can be seen as a complementary and parallel direction of generalization of the results of Amemiya, (1973). Whereas we propose to relax [Amem1] and [Amem2], Kapetanios & Psaradakis, (2016) replaces [Gren1]–[Gren4], instead.
The remainder of the manuscript proceeds as follows. In Section 2, we state and prove our main result. Discussions and remarks are provided in Section 3. Here, we provide results regarding the practical case, where is both hypothesized and estimated from the data. Necessary lemmas and technical results are presented in the Appendix.
2 Main result
We retain all notation from the introduction. Furthermore for matrices , let
[TABLE]
denote the operator norm, and let
[TABLE]
denote the and induced norms, respectively. For vectors , we denote the Euclidean norm of by .
Make the following assumptions.
- [Main1]
The element of the error sequence has form
[TABLE]
and is an independent sequence, where
[TABLE]
- [Main2]
The random sequence has distribution , which satisfies
[TABLE]
- [Main3]
The SDF is real, positive, continuous, and even over .
- [Main4]
The covariance expansion (5) of satisfies
[TABLE]
Lemma 1**.**
Under [Gren1]–[Gren4] and [Main1]–[Main4], approaches
[TABLE]
as .
Proof.
Following from Amemiya, (1973), we write
[TABLE]
and let . Under [Gren1]–[Gren4] and [Main3],
[TABLE]
by Lemma 2. By Lemma 3, is invertible and thus, for any , exists. Thus, we have
[TABLE]
which has the limit, as ,
[TABLE]
Assumption [Main1] implies that is real and positive since is an absolutely summable linear filter of the independent finite variance sequence (cf. Theorems 2.11 and 2.12 of Fan & Yao,, 2003). Since is positive and continuous by [Main3] and is real, positive and continuous by [Main1], we can apply Lemma 5 to obtain
[TABLE]
Upon substitution of (14) into the left-hand side (LHS) of (15) and rearrangement, we obtain
[TABLE]
and have thus verified (12). ∎
Theorem 1**.**
Under [Gren1]–[Gren4] and [Main1]–[Main4],
[TABLE]
where has the form (12).
Proof.
It suffices to show that is asymptotically normal with mean and covariance matrix equal to the LHS of (16). First, write , where
[TABLE]
and is a positive and increasing integer function of , such that . Let , where
[TABLE]
is a matrix and
[TABLE]
is a vector.
To apply Lemma 6, we must show that for each , converges in law to some , as , where is asymptotically normal with mean and covariance matrix (16), as . Then, we must verify that
[TABLE]
for each .
For the purpose of applying the Cramer-Wold device, define , where . Let denote the column of , for . That is,
[TABLE]
Therefore,
[TABLE]
where
[TABLE]
By [Main1] is bounded, and by [Gren4], is bounded. Further, by [Main3] and [Main4], we have the boundedness of . Thus, we obtain the inequalities
[TABLE]
The last fact follows from an application of Lemma 3 and all of the bounds are independent of and .
Observe that is a sequence of independent random variables with expectation \text{\mathbb{E}}\left(W_{t}\right)=0 and . Let be the distribution function of , for each . Then, for any , we have the bound:
[TABLE]
where
[TABLE]
By the bound in (20) and [Main2], we must show that
[TABLE]
to prove that (21) converges to zero, as . Write the row and column element of and as and , respectively. Upon expansion we can obtain the following inequalities for the LHS of (22):
[TABLE]
where is some finite constant.
By [Main1], and are bounded, independently of and .Therefore, for any and , . Similarly, by Assumptions [Main3] and [Main4], we apply Corollary 1 to show that , independently of , and thus . Lastly,
[TABLE]
by [Gren1], [Gren2], and Anderson, (1971, Lem. 2.6.1). Thus (22) is proved.
Next, (22) is sufficient to guarantee that (21) approaches zero as approaches infinity. We can apply the Lindeberg-Feller central limit theorem (DasGupta,, 2008, Thm. 5.1) to obtain .
Via (19), is asymptotically equal in distribution to (as ), where, for any choice of , is normal with mean zero and variance
[TABLE]
Via the Cramer-Wold device (cf. DasGupta,, 2008, Th. 1.16), is asymptotically normal with mean vector and covariance
[TABLE]
using Lemma 5, where is the SDF corresponding to . In other words, is normally distributed with zero mean vector and covariance matrix (23).
By [Main1], and , via the power transfer formula (cf. Lemma 4). Furthermore, since and by preservation of uniform convergence under continuous composition (Bartle & Joichi,, 1961), converges uniformly to , as approaches infinity (cf. Gray,, 2006, Sec. 4.1). Via the Cramer-Wold device, converges in law to (), where has mean vector and covariance matrix
[TABLE]
which is equal to (16).
Finally, we must verify (18). We use Chebyshev’s inequality, which states that for any ,
[TABLE]
where we write the numerator of the right-hand side of (25) as
[TABLE]
which reduces to
[TABLE]
By (16), (23), and (24), we have
[TABLE]
Thus, by (25), condition (18) is verified. This completes the proof. ∎
3 Discussions and remarks
3.1 Notes regarding the assumptions of Theorem 1
We can directly compare [Main1] to [Hann1]. It is notable that [Hann1] is more general than [Main1] since it allows to be a linear filter over a martingale sequence , satisfying . Further, the condition is necessitated so that Lemma 5 can be applied. It is remarked in Amemiya, (1973), however, that [Main1] and [Main2] are more general than [Amem1].
The addition [Main4] is the key that facilitates the proof. This assumption is necessary for bounding and , which is required to prove (17). It must be remarked that [Main4] is a common condition in the literature, and has been made in similar proof methods, such as those of Cheng et al., (2015). The assumption is not restrictive, since a broad class of short-memory processes satisfy [Main4]. For example, any stationary autoregressive moving average (ARMA) process will satisfy [Main4] (cf. Fan & Yao,, 2003, Sec. 2.5). The assumption is also commonly used in the analysis of unit root processes (see, e.g., Hamilton,, 1994, 17.5).
3.2 Feasible generalized least squares
Generally the SDF is unknown and must be estimated from data. Suppose that is an estimator of , which is indexed by the sample size . Denote the FGLS estimator of by . For the FGLS to be of use, we require that the FGLS has the same asymptotic distribution as . To this end, it is sufficient to show that
[TABLE]
where denotes convergence in probability. Denote the auto-covariance matrix corresponding to the estimator as . Then, we may write .
Under [Gren1]–[Gren4] and [Amem1], Amemiya, (1973, Thm. 2) proved that if is the SDP of an AR process of order , that satisfies [Amem2], then (26) holds, when is obtained via the OLS estimator for the AR model coefficients (cf. Amemiya,, 1985, Sec. 5.4). The argument from Amemiya, (1973, Thm. 2) would hold whenever is obtained via any consistent estimator of the AR model coefficients.
The proof of the theorem also remains the same upon replacing [Amem1] by [Main1] and [Main2] and noting that [Amem2] is implied by [Main3] and [Main4]. Thus, under the hypothesis of Theorem 1, if is hypothesized to be the SDF of a stationary AR process of order (i.e., satisfying [Amem2]), then (26) holds, where is obtained via any consistent estimator of the AR coefficients.
It is notable that proving that (26) holds, under [Amem2] is permissive due to the fact that the inverse auto-covariance matrix has a banded Toeplitz form (cf. Verbyla,, 1985). We conjecture that it is possible to obtain similar results using the same techniques as those from Amemiya, (1973), when is any parametric family of SDFs with banded Toeplitz inverse auto-covariance matrices . However, the proof of such a result is beyond the scope of the current paper.
3.3 Further comments regarding the frequency-domain IGLS estimator
We note that Hannan, (1973) proved a more general result than that which we reported in Section 1. Consider the following conditions.
- [Hann4]
The sequence is stationary and with , where is the number and for some .
- [Hann5]
The sequence is a finite segment of , where , is strictly stationary, ergodic, and independent of .
From Hannan, (1973), it was proved that (10) could be obtained by assuming either [Gren1]–[Gren4] or [Hann5], and either [Hann4], or [Hann1] and [Hann2], together with [Hann3]. In the case where [Hann5] is assumed, the matrix in [Gren4] is replaced by , with elements , where is now stochastic.
Assumptions [Hann4] and [Hann5] are similar to the mixing and stochastic assumptions considered in the proofs of Kapetanios & Psaradakis, (2016). A proof of the main result under these conditions could be thus adapted from the techniques of Kapetanios & Psaradakis, (2016). We believe that this is an interesting direction of research. However, it falls outside the aim of our paper, which was to directly improve upon the IGLS results obtained in Amemiya, (1973).
Appendix
The following results are required in our main proofs. Sources for all unproved results are provided at the end of the section.
Lemma 2**.**
Let have covariance matrix , where is a finite segment of the random sequence with SDF . Under [Gren1]–[Gren4], and [Main3],
[TABLE]
Lemma 3**.**
Let have covariance matrix , where is a finite segment of the random sequence with SDF . If for all , then is nonsingular and .
Lemma 4**.**
Let the random sequence satisfy [Main1] and have SDF . Then is real, positive, and continuous.
Proof.
Form (11) implies that is a linear filter of . We may write the so-called transfer function of this filtering as . Since , we have the fact that is continuous (cf. Rudin,, 1987, Sec. 9.4). By the same fact, is also bounded in modulus (i.e. ). Next, using Theorem 2.12 of Fan & Yao, (2003), we may write
[TABLE]
for each , by the fact that is an independent sequence with and . Since the squared modulus is real and positive, by our assumptions, we have the desired result. ∎
Lemma 5**.**
Under [Gren1]–[Gren4], if the sequences and have positive and continuous SDFs and , respectively, and if is bounded, then
[TABLE]
Lemma 6**.**
For each , suppose that , as , and that , as . Furthermore, assume that
[TABLE]
for each and some . If each of the random variables involved have a common separable domain, and if is some appropriate norm on said domain, then , as .
Lemma 7**.**
Let be a real-valued, where (), (; ), and SDF . If is positive and , then, , where is a random process, with and ( and ). Furthermore, , , and
[TABLE]
where and .
Let be a covariance matrix that satisfies Lemma 7. Using the Cholesky decomposition of Akaike, (1969), we may write the inverse of in the form , where
[TABLE]
and . Here, and () are obtained by solving the following problems: for each
[TABLE]
and is the minimal value of the problem. When , set . We have the following result.
Lemma 8**.**
Let be the time series from Lemma 7. Then, there exists a such that for any , , where is a finite constant that only depends on , and is the row and column element of the matrix .
Corollary 1**.**
Let be the time series from Lemma 7. Then, , independently of .
Proof.
Write and apply Seber, (2008, Thm. 4.6.5) to obtain
[TABLE]
Note that , , and . Thus we require only three computations.
Let and and write
[TABLE]
By the triangle inequality, we obtain
[TABLE]
which, for sufficiently large , allows for the application of Lemma 8 in order to obtain
[TABLE]
where is a finite constant that is independent of . Since , we have the fact that and , for any . Thus, is bounded, independently of .
Next, we write as
[TABLE]
Using Lemma 8, we can bound by , which convergences, since . Similarly, we can use Lemma 8 to bound by
[TABLE]
This converges, since . Continuing the pattern, we arrive at the final expression
[TABLE]
which can be bounded by
[TABLE]
Applying Lemma 8, we bound (28) by
[TABLE]
The expression converges by Lemma 7. Furthermore, bounds every term of (27), and hence bounds the maximum term. Therefore, is bounded, independently of .
Lastly, must bound . Since is a diagonal matrix and each ,
[TABLE]
is bounded since each column contains a single finite and positive term. The proof is completed by combining the three bounds. ∎
Lemma 2 appears as Theorem 10.2.7, in Anderson, (1971). Lemma 3 is implied by Theorem 5.2 in Gray, (2006). Lemma 5 appears as Theorem 8 in Ibragimov & Rozanov, (1978, Ch. VII). Lemma 6 appears as Theorem 4.2 in Billingsley, (1968). Lemma 6 is obtained by taking the case (in the notation of the source material) of Brillinger, (2001, Thm. 3.8.4). Lemma 8 can be derived from Lemma 4 of Berk, (1974).
Acknowledgements
The author is indebted to Paul Kabaila, for his interest in this project and for his fruitful comments. The research is funded under Australian Research Council grants DE170101134 and DP180101192.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aguero et al., (2010) Aguero, J. C., Yuz, J. I., Goodwin, G. C., & Delgado, R. A. (2010). On the equivalence of time and frequency domain maximum likelihood estimation. Automatica , 46, 260–270.
- 2Akaike, (1969) Akaike, H. (1969). Power spectrum estimation through autoregressive model fitting. Annals of the Institute of Statistical Mathematics , 21, 407–419.
- 3Amemiya, (1973) Amemiya, T. (1973). Generalized least squares with an estimated autocovariance matrix. Econometrica , 41, 723–732.
- 4Amemiya, (1985) Amemiya, T. (1985). Advanced Econometrics . Cambridge: Harvard University Press.
- 5Anderson, (1971) Anderson, T. W. (1971). The Statistical Analysis of Time Series . New York: Wiley.
- 6Baltagi, (2002) Baltagi, B. H. (2002). Econometrics . Berlin: Springer.
- 7Bartle & Joichi, (1961) Bartle, R. G. & Joichi, J. T. (1961). The preservation of convergence of measurable functions under composition. Proceedings of the American Mathematical Society , 12, 122–126.
- 8Berk, (1974) Berk, K. N. (1974). Consistent autoregressive spectral estimates. Annals of Statistics , 2, 489–502.
