This paper proves the convergence and normality of quasi maximum-likelihood estimators for dynamic panel data models, demonstrating their robustness and superior finite sample performance over GMM estimators through Monte Carlo simulations.
Contribution
It introduces a robust QML estimation method for dynamic panel data models, including an ECME algorithm and a comparison with GMM estimators.
Findings
01
QML estimators have smaller bias and root mean squared errors than GMM estimators.
02
The paper establishes the theoretical properties of QML estimators, including convergence and asymptotic normality.
03
Monte Carlo experiments show QML estimators outperform GMM in finite samples.
Abstract
This paper establishes the almost sure convergence and asymptotic normality of levels and differenced quasi maximum-likelihood (QML) estimators of dynamic panel data models. The QML estimators are robust with respect to initial conditions, conditional and time-series heteroskedasticity, and misspecification of the log-likelihood. The paper also provides an ECME algorithm for calculating levels QML estimates. Finally, it uses Monte Carlo experiments to compare the finite sample performance of levels and differenced QML estimators, the differenced GMM estimator, and the system GMM estimator. In these experiments the QML estimators usually have smaller --- typically substantially smaller --- bias and root mean squared errors than the panel data GMM estimators.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial and Panel Data Analysis · Fiscal Policy and Economic Growth · Regional Economics and Spatial Analysis
This paper establishes the almost sure convergence and asymptotic normality of levels and
differenced quasi maximum-likelihood (QML) estimators of dynamic panel data models. The QML estimators
are robust with respect to initial conditions,
conditional and time-series heteroskedasticity, and misspecification of the log-likelihood.
The paper also provides an ECME algorithm for calculating levels QML estimates. Finally, it uses Monte Carlo
experiments to compare the finite sample performance of levels and differenced QML estimators,
the differenced GMM estimator, and the system GMM estimator. In these
experiments the QML estimators usually have smaller — typically substantially smaller — bias and root mean squared errors than the panel data
GMM estimators.
1 Introduction
Two prominent approaches to estimating a dynamic panel data model are generalized method of moments (GMM) and maximum likelihood (ML). Several authors have studied ML estimation of dynamic panel data models; see, for example, Alvarez and Arellano (2004),
Anderson and Hsiao (1981), Hsiao et al. (2002), and Moral-Benito (2013),
among others. As is well-known, the consistency and asymptotic normality of a ML estimator follows from ML theory assuming the likelihood is correctly specified
and standard regularity conditions are met. On the other hand, strong distributional assumptions are not required to establish the sampling behavior of a GMM estimator. This fact would appear to make GMM more attractive than ML, but GMM has its drawbacks as well — for example, GMM estimators are known to often have severe finite sample bias. Furthermore, some papers have shown that the maximizer of a log-likelihood for a panel data model can be consistent and asymptotically normal under assumptions that do not require normality.
Binder et al. (2005), for example, considered quasi-ML (QML) estimation of vector panel
autoregressions. Kruiniger (2013), on the other hand, studied QML estimation
of a first-order autoregressive (AR(1)) panel data model. And Phillips
(2010, 2015) examined QML estimation of a pth-order dynamic panel data
model. These papers provide conditions under which the log-likelihood for a
dynamic panel data model can be misspecified, and the maximizer of the quasi
log-likelihood is nevertheless consistent and asymptotically normal.
This paper makes several contributions to the literature on QML estimation. Like Phillips (2010, 2015), the model studied in this paper
includes p lags of the dependent variable as well as other explanatory
variables. Phillips (2010, 1015), however, focused on QML estimation without
differencing the observations — i.e., levels QML — while assuming the
errors are unconditionally homoskedastic. The assumption of unconditional
homoskedasticity is more general than it might first appear, for it allows
for conditional heteroskedasticity. But it does not allow for time-series
heteroskedasticity. Allowing for more general forms of heteroskedasticity is
important, for QML estimation, although robust with respect to initial
conditions and misspecification of the log-likelihood, is not robust to
misspecification of the unconditional error variance-covariance matrix; see
also Alvarez and Arellano (2004). This paper, therefore, provides large N,
fixed T asymptotics under more general conditions than those considered in
Phillips (2010, 2015) — conditions that allow for time-series
heteroskedasticity. Indeed, the error variance-covariance matrix can be of a
general form.
Phillips (2010) provided a
straightforward iterative feasible generalized least squares algorithm for
calculating QML estimates when the errors in the dynamic regression model have an error-components structure. However, that procedure is not easily extended to
the case where the idiosyncratic errors are time-series heteroskedastic.
Furthermore, derivative-based algorithms can produce negative fitted variance
components when applied to error-components models if they are not substantially modified to avoid that outcome
(see also Meng and van Dyk 1998). This paper improves on these algorithms by providing an expectation conditional maximization either
(ECME) algorithm for calculating levels QML estimates that allows for
conditional and time-series heteroskedasticity. The ECME algorithm is
straightforward and guarantees non-negative estimated variance components.
The paper also examines QML estimation after differencing the observations
(differenced QML). It shows that the ML estimator examined by
Hsiao et al. (2002) is consistent and asymptotically normal under more
general conditions than the conditions considered by Hsiao et al. (2002).
For example, Hsiao et al. (2002) assumed normality. This paper shows the estimator can be consistent and
asymptotically normal even if the log-likelihood is misspecified. Moreover,
restrictive initial conditions are not required, and the errors can be
conditionally heteroskedastic.
Finally, using simulated data, the finite sample behavior of levels and
differenced QML estimators are compared, and their finite sample behavior is
compared to the differenced GMM (Arellano and Bond 1991) and the system GMM
estimators (Blundell and Bond 1998). The Monte Carlo results show that,
compared to GMM estimators, the QML estimators have negligible finite sample
bias, and consequently they have smaller — sometimes much smaller — root
mean squared errors.
2 QML via Regression Augmentation
Since Anderson and Hsiao (1981) it has been known that whether or not
application of ML estimation to a dynamic panel data model will yield a
consistent estimator as N→∞, with T fixed, depends on
initial conditions. However, Phillips (2010) showed that, when QML
estimation is based on observations in levels (henceforth levels QML), it
does not depend on initial condition restrictions if the regression is
augmented with a suitable control function. This section extends the results
in Phillips (2010) by establishing the almost sure convergence and
asymptotic normality of levels QML estimation under weaker conditions than
thosed used in Phillips (2010). For example, the results provided here allow
for more general specifications of the error variance-covariance matrix.
This generalization is important because QML estimation is inconsistent if
the error variance-covariance matrix is misspecified.
The model examined in this paper is the pth-order dynamic panel data model
[TABLE]
In this expression yi=(yi1,…,yiT)′, Yi=(yi,−1,…,yi,−p), yi,−j=(yi,1−j,…,yi,T−j)′ (j=1,…,p), and Xi=(xi1,…,xiT)′, with xit a K×1 vector of
explanatory variables that vary with t (for at least some i). Moreover, ei=(ei1,…,eiT)′ is a
vector of regression errors. For notational convenience, the numbering of
observed variables begins with t=−p+1.
Straightforward ML estimation of the model in (1) will not
generally yield a consistent estimator. To see why, let yio=(yi0,…,yi,−p+1)′; let xi be a column vector consisting of all of the distinct
elements of xi1,…,xiT; and set zi=(xi′,yio′)′. Then, assuming ei∣zi∼IIN(0,Ω0∗), the log-likelihood is given by
[TABLE]
where ei(φ)=yi−Yiδ−Xiβ, and φ=(δ′,β′)′. If Ω0∗ were known,
then maximizing the log-likelihood in (2) yields the
generalized least squares (GLS) estimator based on Ω0∗,
and the consistency of that estimator requires E(Xi′Ω0∗−1ei)=0
and E(yi,−j′Ω0∗−1ei)=0 (j=1,…,p).
We have E(Xi′Ω0∗−1ei)=0 if the regressors in Xi are strictly exogenous with respect to the errors in ei. But the moment restrictions E(yi,−j′Ω0∗−1ei)=0 (j=1,…,p) depends on an even stronger assumption, which is summarized in Lemma 1.
Lemma 1. If E(eiyio′)=0, E(eixi′)=0, and E(eiei′)=Ω0∗, then E(yi,−j′Ω0∗−1ei)=0 (j=1,…,p).
Proof. See Appendix A.
According to Lemma 1, if the regressors in xitand the initial values of the dependent variable yi0,…,yi,−p+1 are uncorrelated with the errors ei1,…,eiT, then E(yi,−j′Ω0∗−1ei)=0 (j=1,…,p). However, assuming the initial
values of the dependent variable are uncorrelated with subsequent errors is
quite restrictive. For example, a commonly used model for the errors is the
error-components model
[TABLE]
If the vits are uncorrelated, we can take vit to be uncorrelated
with the elements of yio, for t≥1, but assuming
the elements of yio are also uncorrelated with ci
is a strong initial condition restriction.
Fortunately, we need make no such initial condition assumption if the model
in (1) is augmented with a suitable control function. Nor need we
assume the regressors in xit are strictly exogenous with
respect to the eits. The possible correlation between the elements in ei and the elements in zi can be
controlled for by the linear projection of eit on 1 and zi:
[TABLE]
where θ0=Var(zi)−1Cov(zi,eit) and
μ0=E(eit)−E(zi)′θ0.
The linear projection parameters μ0 and θ0 exist and depend on neither i nor t if E(eit) and the moments in Cov(zi,eit) depend on neither i nor t and the moments in Var(zi) and
E(zi) do not depend on i. The
restriction that the linear projection parameters are independent of t is
met if the errors have a one-way error-components structure given by (3) and vit is a mean zero random variable that is uncorrelated with
the elements of zi for t≥1. Then Cov(zi,eit)=Cov(zi,ci)
and E(eit)=E(ci) for t≥1. For this
case, the linear projection reduces to that considered in Phillips (2010,
2015). Specifically, we have
[TABLE]
(cf Phillips 2010, p. 411, Eq. (2)).111See also Chamberlain (1982, 1984) and Kruiniger (2013), who uses a linear
projection of an individual effect on yi0. The linear projection
parameters used in Kruiniger (2013) are implicitly assumed to be independent
of i. If the errors can be decomposed as in Eq. (3), then μ0+zi′θ0 controls for possible
correlation between time-invariant unobservables, captured by ci, and
the elements of zi.
Another, albeit trivial, case in which the linear projection parameters
depend on neither i nor t is when there are no individual specific
effects and the eits are uncorrelated among themselves and with the
elements of zi, for t≥1. In this case, θ0=0, and the linear projection in (4) simplifies
to eit=μe+uit, where E(eit)=μe. This
example illustrates that the necessity of adding the control function μ0+zi′θ0 follows from the
presence of unobservable time-invariant omitted variables, which are
captured by ci.
Moreover, although it is obvious we must include xi in the
control function when the regressors in xit are correlated
with ci, it is also true that we typically must do so even when all of
the regressors in xit are uncorrelated ci, as in the
random effects model. To see this, consider the linear projection of ci
on just 1 and yio:
[TABLE]
where θy0=Var(yio)−1Cov(yio,ci) and μy0=E(ci)−E(yio′)θy0. If we augment the model in (1) with the control
function μy0+yio′θy0 rather than the control function μ0+zi′θ0, then the error term in the
augmented model is ayi+vit rather than ai+vit, and, in order
for QML estimation of the augmented model to be consistent, we must have not
just Cov(yio,ayi)=0, which the
linear projection in (6) ensures, but also Cov(xi,ayi)=0, which the linear projection
in (6) does not guarantee. Indeed, given Cov(xi,ci)=0, the result Cov(xi,ayi)=0 is not guaranteed unless Cov(xi,yio′θy0)=0,222This conclusion follows from Cov(xi,ayi)=Cov(xi,ci−μy0−yio′θy0)=−Cov(xi,yio′θy0) if Cov(xi,ci)=0. which will not be satisfied in general
assuming θy0=0.
This last example illustrates that results obtained for the AR(1) panel data
model (see Kruiniger 2013) or the AR(p) panel data model (see Alvarez and
Arellano 2004) do not extend in a straightforward manner to models with
additional regressors even under the random effects assumption that the
elements of xit are uncorrelated with ci. For
example, in his treatment of the “random
effects” case of the AR(1) panel data model, Kruiniger
includes a linear projection of ci on the initial value yi0 in a
control function. However, such a control function will not suffice if there
are additional regressors even when these additional regressors are
uncorrelated with ci.
Equations (1) and (4) imply the augmented dynamic
panel data model
[TABLE]
where Wi=(Yi,Zi), Zi=(Xi,ι,ιzi′), ι is a T×1 vector of ones, and γ0=(δ0′,β0′,μ0,θ0′)′. The errors in
this augmented model — ui=(ui1,…,uiT)′ — are now uncorrelated with the elements of Zi by construction. Thus, upon letting Ω0=E(uiui′), we have E(Zi′Ω0−1ui)=0. Moreover, because E(uiyio′)=0 and E(uixi′)=0, it follows from Lemma 1
that E(yi,−j′Ω0−1ui)=0 (j=1,…,p). The preceding shows E(Wi′Ω0−1ui)=0.
Now consider the quasi log-likelihood for the augmented model in (7): ∑i=1Nli(ψ), where
[TABLE]
ui(γ)=yi−Wiγ, γ=(δ′,β′,μ,θ′)′, ψ=(γ′,ω′)′, ω= vech(Ω), and Ω is a positive definite matrix. For known ω0=
vech(Ω0), the maximizer of this log-likelihood is
the GLS estimator γGLS=(∑i=1NWi′Ω0−1Wi)−1∑i=1NWi′Ω0−1yi, and this estimator is consistent because E(Wi′Ω0−1ui)=0.
Moreover, if Ω is a consistent estimator of Ω0,
the feasible GLS (FGLS) estimator γFGLS=(∑i=1NWi′Ω−1Wi)−1∑i=1NWi′Ω−1yi is also consistent.
However, the large N (fixed T) distribution of such a FGLS estimator
depends on the first-round estimator of γ0 used to
estimate Ω0 (see Phillips 2010). An alternative that does not
depend on a first-round estimator is to estimate ψ0=(γ0′,ω0′)′ by maximizing the quasi log-likelihood ∑i=1Nli(ψ).
Theorems 1 and 2 provide sufficient conditions for the almost sure
convergence of the QML estimator and its asymptotic normality (as N→∞, with T fixed). In order to state the theorems, set LN(ψ)=N−1∑i=1Nli(ψ) and HN(ψ)=∂2LN(ψ)/∂ψ∂ψ′; let xitk
denote the kth element of xit; and set Ψ={ψ=(γ′,ω′)′∈Rm:Ω is positive definite}.
Theorem 1. Assume the following conditions are satisfied:
C1:
E∣yit∣2+ϵ<M and E∣xitk∣2+ϵ<M for all i, t, and k and some ϵ>0 and M<∞;
C2:
Var(zi)=Ξzz for all i,
with Ξzz a positive definite matrix, E(zi)=μz for all i, and E(eit)=μe and E(zieit)=ϱze for all i and t≥1;
C3:
E(uiui′)=Ω0 for all i, with Ω0 a positive definite matrix;
C4:
the limits limN→∞N−1∑iE(yisyit), limN→∞N−1∑iE(yisxitk), and limN→∞N−1∑iE(xisjxitk) exist for all s, t, j,
and k; and
C5:
the vectors (z1′,y1′)′,…,(zN′,yN′)′ are independent for all N.
Then E[∂LN(ψ0)/∂ψ]=0 and the limit H(ψ)=limN→∞E[HN(ψ)] exists. Moreover, if H0=H(ψ0) is negative definite, then there is
a compact subset, say Ψ, of Ψ, with ψ0 in its interior, and there is a measurable maximizer, ψ, of LN(⋅) in Ψ
such that ψ→a.s.ψ0 (N→∞, T fixed).
Proof. See Appendix B.
Theorem 2. Assume Conditions C2–C5 are satisfied, H0 is negative definite, and the following conditions are
satisfied:
C1′:**
E∣yit∣4+ϵ<M and E∣xitk∣4+ϵ<M for all i, t, and k
and some ϵ>0 and M<∞; and
C6:
the limit I0=limN→∞N−1∑iE[(∂li(ψ0)/∂ψ)(∂li(ψ0)/∂ψ)′] exists and is positive definite.
Then N(ψ−ψ0)→dN(0,H0−1I0H0−1) (N→∞, T fixed).
Proof. See Appendix C.
In order for the QML estimator to be consistent and asymptotically normal, it must be the case that
the true parameter vector, ψ0, uniquely maximizes the expected log-likelihood, at least within a neighborhood of ψ0. Conditions C1 through C3 are mild, and they suffice to guarantee that ψ0 is indeed a stationary value of the expected log-likelihood. But the fact that ψ0 is a stationary value is necessary but not sufficient to ensure it is a unique maximizer of the expected log-likelihood. The matrix H0 must also be negative definite. If the log-likelihood ∑i=1Nli(ψ) is correctly specified, that is, if ui is normally distributed with mean vector 0
and variance-covariance matrix Ω0, conditionally on zi, then by well-known ML theory, we have H0=−I0, and H0 exists and is negative definite
by virtue of Condition C6. However, even when ∑i=1Nli(ψ) is misspecified, H0 can be
shown to be negative definite in particular cases. Phillips (2015), for example, provides an example in which H0 is negative definite under conditions that do not include
normality.
Moreover, Ω0 is the unconditional variance-covariance matrix of ui, and, although it does not depend on i, the
variance-covariance matrix of ui conditionally on zi may depend on i — for example, the errors may be
conditionally heteroskedastic (see also Phillips 2010, 2015). The errors can
also be unconditionally time-series heteroskedastic, for the diagonal elements of Ω0 can differ.
Furthermore, the conditions in Theorems 1 and 2 do not require the
random vectors (z1′,y1′)′,…,(zN′,yN′)′ be drawn from a common
distribution. On the other hand, Conditions C2 and C3 imply some homogeneity is required.
Estimators previously considered in the literature are covered by
Theorems 1 and 2. Blundell and Bond (1998) considered a conditional GLS
estimator of an AR(1) panel data model that relied on augmenting the
regression model with the initial observation on the dependent variable.
They argued that if the error components are homoskedastic across
individuals and time, then restrictions on the initial conditions can be
used to derive the GLS estimator. Theorems 1 and 2, however, show that these
conditions are unnecessarily restrictive. The errors can be conditionally
and time-series heteroskedastic. Moreover, initial condition restrictions
are not needed. All that is required is that the moments defining the
control function parameters exist and depend on on neither i nor t.
Furthermore, the structured error variance-covariance matrices, such as
those considered by Phillips (2010, 2015) and Kruiniger (2013), are special
cases of Ω0, and, therefore, Theorems 1 and 2 cover those
cases.
3 Fixed-Effects QML
An alternative to first augmenting the regression model with a control
function and then applying QML estimation to the model in levels is to
instead first difference the observations and then apply QML estimation. In
the literature, ML or QML estimation based on first differencing the
observations has been referred to as fixed-effects ML estimation (e.g.,
Hsiao et al. 2002) or fixed-effects QML estimation (e.g., Kruiniger 2013).
This description, however, should not lead one to interpret levels QML
estimation as random-effects QML, for the results in Section 2
make clear that levels QML estimation is not restricted to random-effects
models with regressors that are exogenous with respect to ci.
Kruiniger (2013) studied differenced QML for an AR(1) panel data model.
Hsiao et al. (2002), on the other hand, studied ML estimation, after
differencing, and, like this paper, considered a model with additional
explanatory variables beyond a lagged dependent variable. This section shows
that likelihood-based methods using differences are consistent and
asymptotically normal under much weaker conditions than those assumed in
Hsiao et al. (2002).
Instead of augmenting the regression with a control function that
involves yio, differenced QML requires estimation of a
system of equations that includes a separate linear projection for each
initial difference Δyi,−p+2,…,Δyi1, where Δyit=yit−yi,t−1. Specifically, suppose Var(xi) is positive definite, and set θ0,p+1−j=Var(xi)−1Cov(xi,Δyi,−j+2) and μ0,p+1−j=E(Δyi,−j+2)−E(xi′)θ0,p+1−j (j=1,…,p). Then, system differenced QML relies
on estimating the linear projections
[TABLE]
Here ri,p+1−j is a linear projection residual, which is, by
construction, uncorrelated with all of the elements of xi.
Note that because the linear projection in (8) does not specify
how Δyi,−j+2 was generated it does not depend on initial
condition restrictions. In addition to the linear projection equations in (8) we also estimate the differenced equation:
[TABLE]
where Δyi=(Δyi2,…,ΔyiT)′, ΔYi=(Δyi,−1,…,Δyi,−p), and Δyi,−j=(Δyi,−j+2,…Δyi,T−j)′ (j=1,…,p). Moreover, ΔXi=(Δxi2,…,ΔxiT)′, Δxit=xit−xi,t−1, and Δei=(Δei2,…,ΔeiT)′, with Δeit=eit−ei,t−1. For differenced QML, the equations in (8) and (9) are estimated as a system given by
[TABLE]
with yi=(Δyi,−p+2,…,Δyi1,Δyi′)′, ui=(ri1,…,rip,Δei′)′,
[TABLE]
and η0=(δ0′,β0′,μ01,θ01′,μ02,θ02′,…,μ0p,θ0p′)′.
If ui is multivariate normal with mean vector 0 and variance-covariance matrix Υ0 conditional on xi, then the log-likelihood for the system in (10) is ∑i=1Nli(λ), where
[TABLE]
ui(η)=yi−Wiη, η=(δ′,β′,μ1,θ1′,μ2,θ2′,…,μp,θp′)′, λ=(η′,υ′)′, and υ= vech(Υ). Also, set LN(λ)=N−1∑i=1Nli(λ), HN(λ)=∂2LN(λ)/∂λ∂λ′, and Λ={λ=(η′,υ′)′∈Rn:Υ is positive definite}.
The maximizer of ∑i=1Nli(⋅) is a ML estimator given normality, but even if the log-likelihood is
misspecified — that is, the errors are not normally distributed given xi, nor are they necessarily conditionally homoskedastic
— maximizing ∑i=1Nli(⋅) will
still yield a consistent and asymptotically normal estimator under suitable
conditions. Sufficient conditions are provided in Theorems 3 and 4.
Theorem 3. Suppose C1, C4, and C5 are satisfied. Further
assume:
C2′:**
Var(xi)=Ξxx for
all i, with Ξxx positive definite, E(xi)=μx for all i, E(Δyi,−j+2)=μΔyj and E(xiΔyi,−j+2)=ϱxΔyj for all
i(j=1,…,p), and Cov(xi,Δei)=0; also,
C3′:**
E(uiui′)=Υ0 for all i, with Υ0 a positive definite matrix.
Then E[∂LN(λ0)/∂λ]=0, where λ0=(η0′,υ0′)′ and υ0= vech(Υ0). Furthermore, the limit H(λ)=limN→∞HN(λ) exists. Moreover, if H0=H(λ0) is negative definite, there is a compact subset, say Λ, of Λ, with λ0 in its interior, and there is a
measurable maximizer, λ, of LN(⋅) in Λ such that λ→a.s.λ0
(N→∞, T fixed).
Theorem 4. Suppose C1*′–C3′*, C4,
and C5 are satisfied and H0 is negative
definite. Further assume the following condition is met:
C6′:**
the limit I0=limN→∞N−1∑iE[(∂li(λ0)/∂λ)(∂li(λ0)/∂λ)′]
exists and is positive definite.
Then N(λ−λ0)→dN(0,H0−1I0H0−1) (N→∞, T fixed).
Proof. For proofs of Theorems 3 and 4, see Appendix D.
The linear projection of Δyi,−j+2 on 1 and xi
guarantees the residual in this linear projection is uncorrelated with the
elements of ΔXi. This is a critical condition for
consistent differenced QML estimation. But this condition is also met if we
instead used the linear projection of Δyi,−j+2 on 1 and Δxi, where Δxi is a vector
consisting of the distinct elements of ΔXi. The
latter approach generalizes an estimator studied by Hsiao et al. (2002).
Hsiao et al. (2002) studied differenced ML estimation of a dynamic panel
data model while assuming p=1, individual specific effects, and
uncorrelated and conditionally homoskedastic vits. Moreover, Hsiao et
al. (2002) also imposed restrictions on how the regressors are generated.
Furthermore, Hsiao et al. (2002) noted that the likelihood satisfies
standard regularity conditions, and therefore the ML estimator is consistent
and asymptotically normal. However, that conclusion follows from ML theory
assuming the log-likelihood is correctly specified. The analysis in this
section provides weaker conditions that imply the differenced ML estimator
proposed by Hsiao et al. (2002) is consistent and asymptotically normal (for
N→∞, T fixed). Specifically, the log-likelihood can be
misspecified and the vits can be conditionally heteroskedastic. Moreover, all that is required of the
elements of xit is that they be uncorrelated with the vits and that the linear projection of Δyi1 on 1 and Δxi does not depend on i.
4 Computation
If the error variance-covariance matrix is unrestricted, QML estimates can
be easily computed using iterated feasible generalized least squares. Consider, for example, calculating QML estimates of the elements of
Ω0 and γ0. These estimates can be
calculated by iterating back and forth between fitting Ω0 and
fitting γ0. Specifically, LN(⋅) is maximized with respect to the elements of Ω, conditional on the
current fit of the regression parameters, say γc, by
the fit Ω+=∑i=1Nui(γc)ui(γc)′/N. And, after Ω+ is obtained, LN(⋅) is then maximized with respect to γ,
conditional on Ω=Ω+, which gives the feasible generalized
least squares (FGLS) fit:
[TABLE]
This fit is then made the current fit, γc, and new
fits Ω+ and γ+ are calculated again, and
so on, until the sequence of fitted values converges. Calculating QML
estimates of λ0 and Υ0, based on
differenced observations, is similar when Υ0 is unrestricted.
Although it is easy to calculate estimates by iterating back and forth
between fitting Ω0 and fitting γ0, or
between fitting λ0 and Υ0, this
approach implies that the number of free parameters being fitted in either Ω0 or Υ0 increases with T at the rate T2
increases. This fact, in turn, suggests that, if T is not quite small, the
sampling performance of a QML estimator that does not impose valid
restrictions on Ω0 or Υ0 will be poor compared to
that of a QML estimator that does rely on valid restrictions.
Unfortunately, maximizing the likelihood for differenced observations when
restrictions on Υ0 are imposed is tractable only for a highly
specialized case. Specifically, we must assume p=1, eit is given by
the error-components model in (3), the vits are uncorrelated and
unconditionally homoskedastic, and the regressors in xit
are strictly exogenous with respect to the vits. Further assume Δyi1 is generated by the same process generating Δyit for t≥2. Then it is easy to show that the error variance-covariance matrix
is Υ0=σ02Φ0,
[TABLE]
(cf Hsiao et al. 2002, p. 110, Eq. (3.2)). Moreover, the determinant σ02Φ0 equals σ02T[1+T(ϕ0−1)] (see, e.g., Hsiao et al 2002,
p. 111, Eq. (3.7)). From this determinant we see that, in order to ensure a
positive definite fitted value for σ02Φ0, we must
search over values of ϕ satisfying ϕ>1−1/T. This restriction is
guaranteed if we set ϖ=ln(ϕ−1+1/T) and maximize
the log-likelihood
[TABLE]
with respect to η, σ2, and ϖ. Here Φ has exp(ϖ)+1−1/T in its first row, first
column and everywhere else is the same as Φ0 in (12).
Maximizing the log-likelihood for differenced QML estimation becomes much
more complicated if the vits are time-series heteroskedastic or p>1.
On the other hand, the ease with which levels QML estimates can be
calculated is not affected by the size of p nor by whether or not the vits are time-series heteroskedastic. The remainder of this section is
devoted to describing an ECME algorithm that can be applied to calculate
levels QML estimates for arbitrary p and for an error variance-covariance
matrix given by Ω0=σa02ιι′+Σ0, with Σ0= diag(σ012,…,σ0T2).
The ECME algorithm relies on conditional or constrained maximization (CM) of
either an imputed log-likelihood, based on augmented data, or the
log-likelihood based on the observed data. In the present application, the
observed data are y=(y1′,…,yN′)′, while the augmented data
consists of y and a=(a1,…,aN)′.333For the purposes of deriving the imputed log-likelihood and the actual
log-likelihood, the variables in z=(z1′,…,zN′)′ are
treated as fixed. The imputed log-likelihood is built during the
expectation (E) step by taking the conditional expectation of the
log-likelihood for the augmented data given the observed data, while
treating the current fit of the parameters ψc as the
parameters of the conditional distribution.444Liu and Rubin (1994) describe the properties of the ECME algorithm. For
applications of it to panel data see Phillips (2004, 2012).
Applying the ECME algorithm to an error-components model for which Ω0=σa02ιι′+Σ0, with Σ0= diag(σ012,…,σ0T2), leads to the following E and CM steps:
E-step: Let (σa2)c, γc, and Ωc=(σa2)cιι′+Σc, with Σc= diag((σ12)c,…,(σT2)c), denote the current fits of σa02, γ0, and Ω0. Compute the conditional
mean and variance of ai given yi evaluated at the
current fit of the parameters. These are aic=(σa2)cι′(Ωc)−1ui(γc) and υac=(σa2)c[1−(σa2)cι′(Ωc)−1ι], respectively (see, e.g., Greene 2012,
Theorem B.7, pp. 1041-1042). Then the imputed log-likelihood is
[TABLE]
CM-step 1: Maximize Q(⋅;ψc) with respect to ω=(σa2,σ12,…,σT2)′ subject
to the constraint γ=γc. This step
yields (σa2)+=υac+∑i=1N(aic)2/N and
[TABLE]
CM-step 2: Maximize the actual log-likelihood ∑i=1Nli(⋅) with respect to γ subject to the constraint ω=ω+, where ω+=((σa2)+,(σ12)+,…,(σT2)+)′. This step gives the FGLS fit in Eq.
(11) with Ω+=(σa2)+ιι′+Σ+ and Σ+= diag((σ12)+,…,(σT2)+).
After the new fits of the parameters are obtained, they become the current
fits, and the preceding steps are repeated, until convergence.
Unlike some other algorithms, the ECME fitted values for the error variance
components are guaranteed to be non-negative. But this advantage can lead to
another complication. Specifically, EM-like algorithms — including the
ECME algorithm — can be excruciatingly slow to converge, and, when
calculating estimates of error-components models, the rate of convergence
can slow when the sequence of the fitted variance of the individual-specific
effect gets close to zero (see Meng and van Dyk 1998). Moreover, there is
always the possibility that the error-components model in (3) is
inappropriate; specifically, there may be no individual-specific effects. In
this case, we have σc02=0, where σc02=var(ci), and σa02=0, and consequently the sequence of
fitted values for σa02 can approach zero. Furthermore, even if
σc02 is positive and large, σa02 can be small,
for the control function μ0+zi′θ0 is the best linear predictor of ci based on zi, and if that predictor is accurate, then σa02 can be
near zero. If so, the sequence of fitted values for σa02 can
get close to zero.
As a practical matter, however, given Ω0=σa02ιι′+Σ0, with Σ0= diag(σ012,…,σ0T2), then, when the
fitted value for σa02 is near zero, the fitted value γ+ in (11) differs little from the weighted
least squares fit (∑i=1NWi′(Σ+)−1Wi)−1∑i=1NWi′(Σ+)−1yi, which is obtained by setting (σa2)+=0.
Furthermore, once (σa2)+ is set to zero, all
subsequent fitted values for σa02 will be zero. Also, when (σa2)c=0, Eq. (13) simplifies to (σt2)+=∑i=1Nuit(γc)2/N. Thus, if (σa2)+
is set to zero, convergence is rapid. Consequently, the ECME algorithm for
computing level QML estimates will generally converge at a robust rate if,
as part of the convergence criterion, the size of the fitted value for σa02 is evaluated and (σa2)+ is
set to zero should it become sufficiently small.555For example, the fitted value of σa2 might be set to zero when
the fitted value for the average correlation coefficient, say ρ, is small, where ρ=2∑s=1T−1∑t>sTρst/[T(T−1)], with ρst=σa02/[(σa02+σ0s2)(σa02+σ0t2)]1/2. This criterion was
used to obtain the results for the levels QML estimator provided in Section 5.3. In particular, the fitted value of σa2 was set
to zero when the fitted value of ρ fell below 0.01.
5 Monte Carlo Experiments
5.1 Design
In order to assess the finite sampling properties of QML estimators
described in Section 4, Monte Carlo experiments were
conducted. For all of the experiments, observations on the dependent
variable yit were generated according to the model
[TABLE]
with yi,−t0=0. The values for δ0 considered were 0, 0.2,
0.4, 0.6, 0.8, and 0.9. Moreover, the xits were generated according to
the autoregressive process
[TABLE]
The starting value xi,−t0 was set equal to 5+10ξi,−t0 and
the ξits were generated as independent uniform random variates with
mean zero and variance one. Furthermore, two values for t0 were
considered: t0=1 and t0=50. For t0=50, the time series for xit and yit were essentially stationary, whereas for t0=1 they
were nonstationary.
As for the vits, they were generated as vit=xit(ϵit−5)/10, with ϵit a chi-square random
variate with five degress of freedom. The variate (ϵit−5)/10 has an asymmetric distribution about zero with a
variance of one. Moreover, because the ϵits were generated
independently of one another and of the xits, the vits were
uncorrelated but conditionally heteroskedastic. However, the vits were unconditionally homoskedastic for t≥1when t0 was
set to 50, for in this case the xits were essentially stationary by the time t=1. On the other hand, for t0=1, the xits had insufficient time to become approximately stationary by the time t=1. Hence, in this case, the vits were not only conditionally heteroskedastic, they were also
unconditionally time-series heteroskedastic for t≥1.
The heterogeneity component, ci, was generated as ci=∑t=0Tln∣xit∣/(T+1)+σζ(ζi−5)/10, with ζi a chi-square random
variate with five degress of freedom. Furthermore, the parameter σζ was set to
either one or four. This specification for ci induced correlation
between ci and the xits. Moreover, both ci and vit,
conditional on the xits, had non-normal asymmetric distributions,
implying that, conditional on the xits, the error eit=ci+vit
came from a non-normal asymmetric distribution.
After a sample was generated, the start up observations were discarded so
that QML estimation was based on (xi1,yi1),…,(xiT,yiT) and yi0 (i=1,…,N), while GMM
estimation was based on (xi0,yi0),…,(xiT,yiT). Furthermore, T was set to ten, and N was set to
200. Finally, for each combination of parameters, 5,000 independent samples were
generated.
5.2 Estimators
The finite sample properties of levels and differenced QML estimators
were compared to each other and to two well-known GMM estimators. The GMM
estimators considered were the differenced GMM estimator proposed by
Arellano and Bond (1991) (denoted DGMM) and the system GMM estimator
suggested by Blundell and Bond (1998) (SGMM).
Three QML estimators were considered. Results
are provided for levels QML (LQML) while relying on the
structured variance-covariance matrix Ω0=σa02ιι′+Σ0 with Σ0= diag(σ012,…,σ0T2). For this case,
estimates were calculated with the ECME algorithm. Differenced QML estimates
were also calculated. As noted in Section 4, computing
differenced QML estimates via gradient methods is complicated if we model
the vits as time-series heteroskedastic. For this reason, results are
only provided for differenced QML estimates that restrict the vits to
be uncorrelated and unconditionally homoskedastic. Because we can use either
a linear projection of Δyi1 on 1 and Δxi
or a linear projection of Δyi1 on 1 and xi,
results for both choices are reported and are denoted by DQMLΔx and DQMLx.
5.3 Results
5.3.1 Stationary Designs
This section provides results for designs for which the generated variables
were approximately stationary (t0=50). Table 1 provides estimates of
finite sample bias and root mean squared error for the panel data GMM and QML estimators for stationary designs with σζ=1 and σζ=4.
The evidence in Table 1 shows that the QML estimators — LQML, DQMLx, and DQMLΔx — generally have
neglible finite sample bias, and, consequently, their root mean squared
errors are significantly smaller than that of the GMM estimators, which have
non-neglible finite sample bias. Moreover, for most designs, whether one
uses DQMLx or DQMLΔx does not
matter much; they have similar finite sample bias and root mean squared
error. The exception is when δ0=0.9. For highly persistent designs,
DQMLx outperforms DQMLΔx. But
among the QML estimators, the levels QML estimator (LQML) is — in terms of
root mean squared error — best.
The system GMM estimator was introduced as a response to the poor sampling
performance of the differenced GMM estimator when δ0 is near one.
Blundell and Bond (1998) showed that the system GMM estimator will perform
better than the differenced GMM estimator in this case, and it does indeed
have smaller bias and root mean squared error than the differenced GMM
estimator for δ0 near one and σζ=1. However, surprisingly, its sampling
performance is worse — often much worse — than that of the differenced
GMM estimator for δ0 not near one. Furthermore, when σζ=4, the system GMM estimator has substantial bias even when δ0 is near one. Bun and Windmeijer (2010) provide an explanation for this result. They note the system GMM estimator may
suffer from a weak instrument problem when the variance of the
individual-specific effect is large relative to the variance of the
idiosyncratic error. The sampling performance of the QML estimators, on the
other hand, are unaffected by the relative size of the individual-specific
effect variance versus the idiosyncratic error variance.
Table 1: Finite sample characteristics of estimators of δ0 for t=50.
[TABLE]
5.3.2 Nonstationary Designs
Table 2 provides finite sample bias and root mean squared error
estimates for nonstationary designs. For these designs t0=1, and,
therefore, for each cross section, the time series began in the immediate
past.
Table 2: Finite sample characteristics of estimators of δ0 for t=1.
[TABLE]
In order for the system GMM estimator to be consistent (as N→∞) the stochastic process for each individual has to have had
sufficient time to converge to its steady state by time t=1 (see, e.g.,
Roodman 2009). However, given t0=1, convergence to a steady state at
time t=1 has clearly not occurred. The effect of the failure of this
initial condition restriction is most striking when σζ=4. In this case, for many designs, the absolute bias and root mean squared error of the system GMM estimator is much larger than that of the other estimators.
Except for the condition that yi0 must be uncorrelated with vit
for t≥1, the QML estimators are unaffected by initial conditions.
However, the consistency (as N→∞) of the differenced QML
estimators — DQMLx and DQMLΔx
— depends on the vits being unconditionally homoskedastic, and, when t0=1, the vits are time-series heteroskedastic. Consequently, in
Table 2, the differenced QML estimators no longer dominate the
differenced GMM estimator in terms of finite sample bias. On the other hand,
the levels QML estimator is robust with respect to time-series
heteroskedasticity, and therefore its finite sample bias is still negligible
for t0=1.
6 Conclusions
This paper established the almost sure convergence and asymptotic normality
of levels and differenced QML estimators of the parameters of a pth-order
dynamic panel data model. The almost sure convergence and asymptotic
normality of the estimators do not depend on initial conditions, like those
required by the sytem GMM estimator. Moreover, the log-likelihood can be
misspecified, and the errors can be conditionally and time-series
heteroskedastic. However, only levels QML estimates can be easily calculated
when the errors are time-series heteroskedastic. The paper provided an
ECME algorithm for this case. Furthermore, the levels QML estimator
dominated all of the other estimators in terms of having the smallest root
mean squared errors.
Appendix A: Lemma 1 Proof
In order to establish E(yi,−j′Ω0∗−1ei)=0, I first use an analysis
similar to that in Hamilton (1994, pp. 7-9). Let ξit=(yit,yi,t−1,…,yi,t−p+1)′, ςit=(xit′β0+eit,0,…,0)′, and
[TABLE]
where δ0=(δ01,…,δ0p)′. Then ξit=Fξi,t−1+ςit. Hence, ξi1=Fξi0+ςi1, and, for t>1, by
repeated substitutions we get ξit=Ftξi0+Ft−1ςi1+Ft−2ςi2+⋯+Fςi,t−1+ςit. Writing this last
expression out in full, we have
[TABLE]
Next let frs(t) denote the (r,s)th
element of Ft. Then yi1=f11(1)yi0+f12(1)yi,−1+⋯+f1p(1)yi,−p+1+xi1′β0+ei1,
and, for t>1, from the first equation in (40) we see that
[TABLE]
Using the expression for yit in Eq. (41), we can write yi,−j in terms of yio, Xi, and ei. To that end, let Aj and Bj be T×p and T×T matrices given by
[TABLE]
[TABLE]
Given these definitions, we have yi,−j=Ajyio+Bj(Xiβ0+ei).
Therefore, E(yi,−j′Ω0∗−1ei)=E(yio′Aj′Ω0∗−1ei)+E(β0′Xi′Bj′Ω0∗−1ei)+E(ei′Bj′Ω0∗−1ei). Note that E(ei′Bj′Ω0∗−1ei)=E[tr(Ω0∗−1eiei′Bj′)]= tr[Ω0∗−1E(eiei′)Bj′]= tr(Bj′)=0, where the last equality follows from the fact
that Bj is a square matrix with zeros down the main
diagonal. Moreover, if E(eixi′)=0, then E(β0′Xi′Bj′Ω0∗−1ei)=0. And E(yio′Aj′Ω0∗−1ei)=
tr[E(eiyio′)Aj′Ω0∗−1]=0 given E(eiyio′)=0. The
preceding proves E(yi,−j′Ω0∗−1ei)=0.
Appendix B: Theorem 1 Proof
The proof of Theorem 1 relies on verifying several preliminary
results, which are provided as Lemmas B.1 through B.3. Throughout
convergence is with respect to N→∞, with T fixed.
Moreover, in the sequel, M denotes a sufficiently large finite number.
Lemma B.1. Suppose E(xitk2)<∞
and E(yit2)<∞, for each i, t, and k, and
Conditions C2 and C4 are satisfied. Then the linear projection in (4)
exists. Furthermore, the limits L(ψ)=limN→∞E[LN(ψ)] and H(ψ)=limN→∞E[HN(ψ)] exist, and L(ψ) and
the elements of H(ψ) are
continuous functions of ψ.
Proof. The conditions E(xitk2)<∞ and E(yit2)<∞, for each i, t, and k, and
C2 imply the existence of the linear projection in (4) (see, e.g.,
Wooldridge, 2010, pp. 25-26).
Also, E[LN(ψ)] is
finite if E[ui(γ)′Ω−1ui(γ)] is finite, and the latter is finite if xitk and yit have
finite second-order moments, for all i, t, and k.
The matrix E[HN(ψ)] has finite elements as well. To see this, first let Wi⋅j denote the jth column of Wi, and let S⋅j denote the jth column of ∂vec(Ω)/∂ω′, where recall that ω= vech(Ω). Then, ∂2li(ψ)/∂γj∂γk=−Wi⋅j′Ω−1Wi⋅k, ∂2li(ψ)/∂γj∂ωk=−Wi⋅j′Ω−1(∂Ω/∂ωk)Ω−1ui(γ), and
[TABLE]
where sijk(1)(ψ)=S⋅j′(Ω−1⊗Ω−1ui(γ)ui(γ)′Ω−1)S⋅k and sijk(2)(ψ)=S⋅j′(Ω−1ui(γ)ui(γ)′Ω−1⊗Ω−1)S⋅k (see Ruud 2000, p. 930). From the
preceding second-order partial derivatives we see that the condition E(xitk2)<∞ and E(yit2)<∞, for each i, t, and k, implies E[HN(ψ)] has finite elements.
Inspection of E[LN(ψ)] and
the elements of E[HN(ψ)] reveals E[LN(ψ)]
and the elements of E[HN(ψ)] are functions of ψ and terms of the
form N−1∑iE(yisyit), N−1∑iE(yisxitk), and N−1∑iE(xisjxitk).
Therefore, if the limits of these averages exist (as N→∞), then the limits L(ψ)=limN→∞E[LN(ψ)] and H(ψ)=limN→∞E[HN(ψ)] exist,
where L(ψ) and the elements of H(ψ) are functions of ψ
and terms involving limits of the form limN→∞N−1∑iE(yisyit), limN→∞N−1∑iE(yisxitk), and limN→∞N−1∑iE(xisjxitk). And, inspection of L(ψ) and the elements of H(ψ) reveals L(ψ) and the elements of H(ψ) are continuous functions of ψ.
Lemma B.2. Let Ψ denote a compact subset
of Ψ. Suppose C1, C4, and C5 are satisfied. Then LN(⋅)→a.s.L(⋅) uniformly on Ψ.
Proof. Let ωst denote the (s,t)th element of Ω−1; let γk denote the kth element of γ; recall that Wi⋅j is the jth
column of Wi; and let Witj denote the tth element
of Wi⋅j. Also, let Sysyt,N=N−1∑i[yisyit−E(yisyit)], SysWtj,N=N−1∑i[yisWitj−E(yisWitj)], and SWsjWtk,N=N−1∑i[WisjWitk−E(WisjWitk)]. Then LN(ψ)−E[LN(ψ)]=−∑s∑tωstSysyt,N/2+∑s∑tωst∑jγjSysWtj,N−∑s∑tωst∑j∑kγjγkSWsjWtk,N/2. Therefore, by an obvious inequality,
we have ∣LN(ψ)−E[LN(ψ)]∣≤∑s∑t∣ωst∣∣Sysyt,N∣/2+∑s∑t∣ωst∣∑k∣γk∣∣SysWtk,N∣+∑s∑t∣ωst∣∑j∑k∣γjγk∣SWsjWtk,N/2. Given ωst and γk are bounded for ψ∈Ψ, it follows that
[TABLE]
Hence, LN(⋅)−E[LN(⋅)]→a.s.0 uniformly on Ψ if Sysyt,N→a.s.0, SysWtk,N→a.s.0, and SWsjWtk,N→a.s.0
for each s, t, j, and k.
To see that Sysyt,N→a.s.0, note that, by
the Cauchy-Schwarz inequality and C1, we get E∣yisyit∣1+ϵ/2≤(E∣yis∣2+ϵE∣yit∣2+ϵ)1/2<M for some ϵ>0 and all i, s, and t. This
conclusion and C5 imply Sysyt,N→a.s.0 (see
White 2001, p. 35, Corollary 3.9). By similar arguments, we also have SysWtk,N→a.s.0 and SWsjWtk,N→a.s.0. Hence, LN(⋅)−E[LN(⋅)]→a.s.0 uniformly
on Ψ.
Given C4, the following expressions are defined: Aysyt,N=N−1∑iE(yisyit)−limN→∞N−1∑iE(yisyit), AysWtj,N=N−1∑iE(yisWitj)−limN→∞N−1∑iE(yisWitj), and AWsjWtk,N=N−1∑iE(WisjWitk)−limN→∞N−1∑iE(WisjWitk).
And, by arguments analogous to those leading to the inequality in (45), one can show supψ∈Ψ∣E[LN(ψ)]−L(ψ)∣≤M∑s∑t∣Aysyt,N∣+M∑s∑t∑jAysWtj,N+M∑s∑t∑j∑kAWsjWtk,N. Because Aysyt,N, AysWtj,N, and AWsjWtk,N all →0, we have E[LN(⋅)]→L(⋅)
uniformly on Ψ.
The conclusions of the last two paragraphs imply LN(⋅)→a.s.L(⋅) uniformly on Ψ.
Lemma B.3. If C1–C3 are satisfied, then E[∂LN(ψ0)/∂ψ]=0. If, in addition, C4 and C5 are satisfied and H0 is negative definite, then there is a compact subset Ψ of Ψ, with ψ0 in its interior,
such that L(ψ)<L(ψ0) if ψ∈Ψ and ψ=ψ0.
Proof. First
[TABLE]
is established. By well known results, ∂li(ψ)/∂γ=Wi′Ω−1ui(γ) and
[TABLE]
(see, e.g., Ruud, 2000, pp. 928-930). To see that E[∂li(ψ0)/∂γ]=0, first note that E(Zi′Ω0−1ui)=0 because all of the
elements of ui are uncorrelated with all of the elements
of Zi by construction. Moreover, C1–C3 imply E(uiyio′)=0
and E(uixi′)=0, and E(uiui′)=Ω0. Thus, the conditions of Lemma 1 hold for the augmented
regression in (7). Hence, by Lemma 1, we have E(yi,−j′Ω0−1ui)=0
(j=1,…,p). This proves E[∂li(ψ)/∂γ]=0.
Furthermore, from Eq. (47), it is clear that, because E(uiui′)=Ω0, we have
E[∂li(ψ0)/∂ω]=0. Hence, E[∂LN(ψ0)/∂ψ]=0.
Next, a Taylor series expansion gives
[TABLE]
where gN(ψ0)=∂LN(ψ0)/∂ψ, and
ψ∗ satisfies ∥ψ−ψ∗∥≤∥ψ−ψ0∥. Given Eq. (46) and Lemma
B.1, taking the expectation of the left and right-hand sides of (48) and then letting N→∞ gives L(ψ)=L(ψ0)+(ψ−ψ0)′H(ψ∗)(ψ−ψ0)/2.
Let hjk(ψ) denote the (j,k)th element of H(ψ), and define
determinants
[TABLE]
By assumption, H0=H(ψ0) is
negative definite, and thus d1(ψ0)<0, d2(ψ0)>0, d3(ψ0)<0,… (see Rao 1973, p. 37). Moreover, the determinant dj(⋅) is continuous in h11(⋅),h12(⋅),…, which are, in turn, continuous in ψ (see Lemma B.1). Hence, dj(⋅) is continuous in ψ. It follows that there is a r>0 such that for the
closed ball in Rm, centered at ψ0, with radius r, we have d1(ψ)<0, d2(ψ)>0, d3(ψ)<0,… for ψ in the ball. Let Ψ denote the ball (a
compact subset of Rm). Then H(ψ) is negative
definite for ψ∈Ψ. Therefore, for ψ=ψ0 and ψ∈Ψ, we must have (ψ−ψ0)′HN(ψ∗)(ψ−ψ0)<0,
because ψ∈Ψ implies ψ∗∈Ψ and, therefore, H(ψ∗) is negative definite. Hence, L(ψ)<L(ψ0) if ψ∈Ψ and ψ=ψ0.
Proof of Theorem 1: The conclusions of Lemmas B.2 and
B.3 imply there is a measurable maximizer, ψ,
in Ψ and ψ→a.s.ψ0 (see, e.g., Amemiya, 1985, Theorem
4.1.1, and his footnote 1 on p. 107).
Appendix C: Theorem 2 Proof
Theorem 2 is proven by establishing several lemmas. The first
result is an elementary inequality, which is applied repeatedly in the
sequel.
Lemma C.1. For r>0, ∑j=1majr≤br∑j=1m∣aj∣r where br=1 or 2(r−1)(m−1) according as r≤1 or r≥1.
Proof. By repeated application of the inequality ∣a+b∣r≤cr∣a∣r+cr∣b∣r, r>0, where cr=1 or 2r−1
according as r≤1 or r≥1 (see Loève 1977, p. 157), we have ∑j=1majr≤cr∣a1∣r+cr∑j=2majr≤cr∣a1∣r+cr2∣a2∣r+cr2∑j=3majr≤∑j=1m−1crj∣aj∣r+crm−1∣am∣r. Also, ∑j=1m−1crj∣aj∣r+crm−1∣am∣r≤br∑j=1m∣aj∣r for br=crm−1.
Lemma C.2. Suppose C1*′*, C2, C3, C5, and C6 are
satisfied. Then NgN(ψ0)→dN(0,I0).
Proof. Let μ be a m×1 vector of
constants such that μ=0. We have μ′NgN(ψ0)=N−1/2∑iZi for Zi=μ′(∂li(ψ0)/∂ψ). And NgN(ψ0)→dN(0,I0) if N−1/2∑iZi→dN(0,μ′I0μ) (see Amemiya 1985, Theorem 3.3.8).
To verify N−1/2∑iZi→dN(0,μ′I0μ), let νi2=var(Zi)=μ′E[(∂li(ψ0)/∂ψ)(∂li(ψ0)/∂ψ)′]μ, and νN2=N−1∑iνi2. Because limN→∞νN2=μ′I0μ (by C6), we have N−1/2∑iZi→dN(0,μ′I0μ) if N−1/2∑iZi/νN→dN(0,1). Moreover, N−1/2∑iZi/νN→dN(0,1) if E(Zi)=0, νN2>ϵ′>0 for all N sufficiently
large, and E∣Zi∣2+ϵ/2<M for
all i and some ϵ/2>0 (see White 2001, Theorem 5.10). Therefore,
Lemma C.2 is proven upon proving E(Zi)=0, νN2>ϵ′>0 for all N sufficiently
large, and E∣Zi∣2+ϵ/2<M for
all i and some ϵ/2>0.
We can verify E(Zi)=0 and νN2>ϵ′>0 for all N sufficiently large easily. In
particular, Eq. (46) implies E(Zi)=0. Moreover, given C6, we have limN→∞νN2=μ′I0μ, and,
because I0 is positive definite, we can find an ϵ′>0 such that νN2>ϵ′ for
all N sufficiently large.
To verify E∣Zi∣2+ϵ/2<M
for all i and some ϵ/2>0, first let μj and ψj
denote the jth elements of μ and ψ.
Then Zi=∑jμj∂li(ψ0)/∂ψj. Hence, by Lemma C.1, we have E∣Zi∣2+ϵ/2<M for all i if E∂li(ψ0)/∂ψj2+ϵ/2<M for all i and j.
Next, recall ∂li(ψ0)/∂γ=Wi′Ω0−1ui while ∂li(ψ0)/∂ω=−vech(Ω0−1−Ω0−1uiui′Ω0−1)/2.
Moreover, upon letting ω0st denote the (s,t)th
element of Ω0−1 and recalling Wisj denotes the (s,j)th element of Wi, the elements of Wi′Ω0−1ui are of the form ∑s∑tω0stWisjuit while the elements of vech(Ω0−1−Ω0−1uiui′Ω0−1) are of the form ω0jk−∑s∑tω0jsω0ktuisuit.
These observations and another application of Lemma C.1 implies E∂li(ψ0)/∂ψj2+ϵ/2<M for all i and j if E∣Wisjuit∣2+ϵ/2<M and E∣uisuit∣2+ϵ/2<M for all i, j, s, and t.
But E∣Wisjuit∣2+ϵ/2≤(E∣Wisj∣4+ϵE∣uit∣4+ϵ)1/2 by the Cauchy-Schwarz inequality. Moreover,
for a suitable choice of ϵ>0, we have E∣Wisj∣4+ϵ<M for all i,s, and j by C1*′. Condition C1′* also implies E∣uit∣4+ϵ<M for all i and t. Hence, E∣Wisjuit∣2+ϵ/2<M for all i, j, s, and t.
Similar arguments give E∣uisuit∣2+ϵ/2<M for all i, s, and t. It follows that E∣Zi∣2+ϵ/2<M for all i and some ϵ/2>0.
Lemma C.3. Let Ψ be a compact subset of Ψ. Suppose C1, C4, and C5 are satisfied. Then HN(⋅)→a.s.H(⋅) uniformly on Ψ.
Proof. Let hγjγk(ψ)=limN→∞E[∂2LN(ψ)/∂γj∂γk]. Then ∂2LN(ψ)/∂γj∂γ−hγjγk(ψ)=∑s∑tωst(SWsjWtk,N+AWsjWtk,N)≤∑s∑t∣ωst∣(SWsjWtk,N+AWsjWtk,N). (For the definitions of SWsjWtk,N and AWsjWtk,N, see the proof of Lemma B.2.) Given ωst is
bounded for ψ∈Ψ, we have supψ∈Ψ∂2LN(ψ)/∂γj∂γk−hγjγk(ψ)≤M∑s∑t(SWsjWtk,N+AWsjWtk,N). Recall that SWsjWtk,N→a.s.0 (see the proof of Lemma
B.2), and AWsjWtk,N→0. Therefore, ∂2LN(⋅)/∂γj∂γk→a.s.hγjγk(⋅)
uniformly on Ψ.
Let hγjωk(ψ)=limN→∞E[∂2LN(ψ)/∂γj∂ωk]. Also, let ϑk,st denote the (s,t)the element of Ω−1(∂Ω/∂ωk)Ω−1. Then
∂2LN(ψ)/∂γj∂ωk−hγjωk(ψ)=−∑s∑tϑk,st[SysWtj,N+AysWtj,N]+∑s∑t∑lϑk,stγl[SWsjWtl,N+AWsjWtl,N]. (For the definitions of SysWtj,N and AysWtj,N, see the proof of Lemma B.2.) Because ϑk,st is a continuous function on Ψ, and, therefore,
bounded on Ψ, and γl is bounded for ψ∈Ψ, we have supψ∈Ψ∂2LN(ψ)/∂γj∂ωk−hγjωk(ψ)≤M∑s∑t(SysWtj,N+AysWtj,N)+M∑s∑t∑l(SWsjWtl,N+AWsjWtl,N). Given SysWtj,N→a.s.0, SWsjWtl,N→a.s.0, AysWtj,N→0, and AWsjWtl,N→0, we
have ∂2LN(⋅)/∂γj∂ωk→a.s.hγjωk(⋅) uniformly on Ψ.
Finally, from (44), we see that ∂2LN(ψ)/∂ωj∂ωk−E(∂2LN(ψ)/∂ωj∂ωk)=−(2N)−1∑i{sijk(1)(ψ)−E[sijk(1)(ψ)]}−(2N)−1∑i{sijk(2)(ψ)−E[sijk(2)(ψ)]}. Note that
[TABLE]
where UN(γ)=N−1∑i{ui(γ)ui(γ)′−E[ui(γ)ui(γ)′]}. Because
S⋅j is a vector of zeros and ones, we see that the
right-hand side of (49) is a sum of the elements of Ω−1⊗Ω−1UN(γ)Ω−1. Therefore, if each element of this matrix converges
almost surely to zero uniformly on Ψ, then N−1∑i{sijk(1)(⋅)−E[sijk(1)(⋅)]}→a.s.0 uniformly on Ψ. Similar arguments can be
used to show N−1∑i{sijk(2)(⋅)−E[sijk(2)(⋅)]}→a.s.0 uniformly on Ψ.
To see that each element of Ω−1⊗Ω−1UN(γ)Ω−1 converges almost surely
to zero uniformly, note that the matrix Ω−1⊗Ω−1UN(γ)Ω−1 can be
partitioned into T×T sub-matrices of the form ωlmΩ−1UN(γ)Ω−1 (l=1,…,T, m=1,…,T). Furthermore, the (j,k)th
element of ωlmΩ−1UN(γ)Ω−1 is ωlm∑s∑tωjsωktN−1∑i{uis(γ)uit(γ)−E[uis(γ)uit(γ)]}. And, by familiar arguments, we can show that the absolute value
of this element is no greater than M∑s∑tN−1∑i{uis(γ)uit(γ)−E[uis(γ)uit(γ)]} for ψ∈Ψ. Moreover, N−1∑i{uis(γ)uit(γ)−E[uis(γ)uit(γ)]}=Sysyt,N−∑qγq(SysWtq,N+SytWsq,N)+∑q∑rγqγrSWsqWtrN, and, given γ is bounded for ψ∈Ψ, we have
[TABLE]
Because the right-hand side (50) →a.s.0 (see the proof of Lemma B.2), we have N−1∑i{si,jk(1)(⋅)−E[si,jk(1)(⋅)]}→a.s.0 uniformly on Ψ. Simliar arguments establish N−1∑i{si,jk(2)(⋅)−E[si,jk(2)(⋅)]}→a.s.0 uniformly on Ψ. It follows
that ∂2LN(⋅)/∂ωj∂ωk−E[∂2LN(⋅)/∂ωj∂ωk]→a.s.0 uniformly on Ψ.
Let hωjωk(ψ)=limN→∞E[∂2LN(ψ)/∂ωj∂ωk]. We can
establish E[∂2LN(⋅)/∂ωj∂ωk]→hωjωk(⋅) uniformly on Ψ by arguments paralleling
those in the last two paragraphs. (For example, in the foregoing
derivations, replace N−1∑iE[uis(γ)uit(γ)] with limN→∞N−1∑iE[uis(γ)uit(γ)] and N−1∑iuis(γ)uit(γ) with N−1∑iE[uis(γ)uit(γ)]. Also, replace Sysyt,N, SysWtq,N, SytWsq,N, and SWsqWtrN with Aysyt,N, AysWtq,N, AytWsq,N, and AWsqWtrN.)
From the foregoing, we have ∂2LN(⋅)/∂ωj∂ωk→a.s.hωjωk(⋅) uniformly on Ψ.
Proof of Theorem 2: The conclusions of Lemmas C.2 and
C.3, the consistency of ψ, the continuity of H(⋅) at ψ0, and the
nonsingularity of H0=H(ψ0) imply N(ψ−ψ0)→dN(0,H0−1I0H0−1) (see
Newey and McFadden 1994, Theorem 3.1).
Appendix D: Proof of Theorems 3 and 4
The proofs of Theorems 3 and 4 are similar to the proofs of
Theorems 1 and 2. For example, Conditions C1 and C2*′* ensure the
linear projection parameters in (8) exist and do not depend on i and the errors in ui are uncorrelated with
the regressors in xi. Furthermore, the quasi
log-likelihood ∑i=1Nl(λ0) is similar to the quasi log-likelihood ∑i=1Nl(ψ0), and, therefore, most of the technical
details are the same as in Appendices B and C and need not be repeated.
However, the conlusions of Theorems 3 and 4 depend on E(Wi′Υ0−1ui)=0 being true, and the proof of this result, though
similar to the proof of Lemma 1, differs in some details. Therefore, the
proof of E(Wi′Υ0−1ui)=0 is provided in this
appendix.
Lemma D.1. Suppose E(xitk2)<∞
and E(yit2)<∞, for each i, t, and k, and
Conditions C2*′* and C3*′* are satisfied. Then E(Wi′Υ0−1ui)=0.
Proof. Let
[TABLE]
Given this definition, showing E(Wi′Υ0−1ui)=0 consists of showing E(Zi′Υ0−1ui)=0 and E[(0,Δyi,−j′)Υ0−1ui]=0 (j=1,…,p). Under the conditions of the lemma, the
elements of Zi are uncorrelated with the
elements of ui; hence, E(Zi′Υ0−1ui)=0. It remains to show E[(0,Δyi,−j′)Υ0−1ui]=0.
This result can be established by arguments similar to those used in the
proof of Lemma 1. Specifically, let Δξit=(Δyit,Δyi,t−1,…,Δyi,t−p+1)′,Δςit=(Δxit′β0+Δeit,0,…,0)′ and let F be defined as in (14). Then we
get Δξi2=FΔξi1+Δςi2; and, for t>2, we have Δξit=Ft−1Δξi1+Ft−2Δςi2+⋯+FΔςi,t−1+Δςit. Let frs(t) denote the (r,s)th element of Ft. Then, the preceding implies Δyi2=f11(1)Δyi1+f12(1)Δyi0+⋯+f1p(1)Δyi,−p+2+Δxi2′β0+Δei2; and,
for t>2, we have Δyit=f11(t−1)Δyi1+f12(t−1)Δyi0+⋯+f1p(t−1)Δyi,−p+2+f11(t−2)(Δxi2′β0+Δei2)+⋯+f11(1)(Δxi,t−1′β0+Δei,t−1)+Δxit′β0+Δeit (see the
proof of Lemma 1).
Using these equations we can write Δyi,−j as Δyi,−j=AjΔξi1+Bj(ΔXiβ0+Δei), where Aj is a (T−1)×p matrix
consisting of the first T−1 rows of Aj (see Eq. (42)) and Bj is a (T−1)×(T−1) matrix consisting of the first T−1 rows and first T−1
columns of Bj (see Eq. (43)). Recall (Δyi,−p+2,…,Δyi1)′=[Ip⊗(1,xi′)]π0+ri for π0=(μ01,θ01′,μ02,θ02′,…,μ0,p,θ0,p′)′ and ri=(ri1,…,rip)′ (see Eq. (10)). Moreover, note that
Δξi1=I∗(Δyi,−p+2,…,Δyi1)′ for p×p matrix
[TABLE]
Let
[TABLE]
Then some straightforward calculations give
[TABLE]
Because the elements of ui are uncorrelated
with the elements of Zi, we have E[(β0′,π0′)Zi′Dj′Υ0−1ui]=0. Also, E(ui′Dj′Υ0−1ui)= tr[Υ0−1E(uiui′)Dj′]= tr(Dj′). But tr(Dj′)=0, because the upper left-hand submatrix 0
in Dj is square with zeros down its main diagonal and Bj is a square matrix with zeros down its main
diagonal, and, therefore, Dj has zeros down its main
diagonal. These observations and Eq. (51) prove E\left[\left(\begin{array}[]{cc}\mathbf{0},&\Delta\boldsymbol{y}_{i,-j}^{\prime}\end{array}\right)\Upsilon_{0}^{-1}\widetilde{\boldsymbol{u}}_{i}\right]=0.
References:
Alvarez, J., Arellano, M. (2004). Robust likelihood estimation of
dynamic panel data models. CEMFI Working Paper 0421.
Amemiya, T. (1985). Advanced Econometrics. Cambridge, MA:
Harvard University Press.
Anderson, T. W., Hsiao, C. (1981). Estimation of dynamic models with
error components. Journal of the American Statistical Association
76, 598-606.
Arellano, M., Bond, S. (1991). Some tests of specification for panel
data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies 58, 277-297.
Binder, M., Hsiao, C., Pesaran, M. H. (2005). Estimation and inference
in short panel vector autoregressions with unit roots and cointegration.
Econometric Theory 21, 795-837.
Blundell, R., Bond, S. (1998). Initial conditions and moment
restrictions in dynamic panel data models. Journal of Econometrics
87, 115-143.
Bun, M. J. G., Windmeijer, F. (2010). The weak instrument problem of
the system GMM estimator in dynamic panel data models. The
Econometrics Journal 13, 95-126.
Chamberlain, G. (1982). Multivariate regression models for panel data.
Journal of Econometrics 18, 5-46.
Chamberlain, G. (1984). Panel data. In: Griliches, Z., Intriligator,
M. D. (eds.), Handbook of Econometrics, Vol. 2. Amsterdam: North
Holland, pp. 1247–1318.
Hamilton, J. D. (1994). Time Series Analysis. Princeton, NJ:
Princeton University Press.
Hsiao, C., Pesaran, H. M., Tahmiscioglu, A. K. (2002). Maximum
likelihood estimation of fixed effects dynamic panel data models covering
short time periods. Journal of Econometrics 109, 107-150.
Kruiniger, H. (2013). Quasi ML estimation of the panel AR(1) model
with arbitrary initial conditions. Journal of Econometrics 173,
175-188.
Loève, M. (1977). Probability Theory I, 4th ed. New York,
NY: Springer-Verlag.
Meng, X.-L., van Dyk, D. (1998). Fast EM-Type Implementations for
Mixed Effects Models. Journal of the Royal Statistical Society,
Series B (Statistical Methodology) 60, 559-578.
Moral-Benito, E. (2013). Likelihood-based estimation of dynamic panels
with predetermined regressors. Journal of Business & Economic
Statistics 31, 451-472.
Newey, W. K., McFadden, D. (1994). Large sample estimation and
hypothesis testing. In: Engle, R. F., McFadden, D. L. (eds.), Handbook of Econometrics, Vol. 4. Amsterdam: North Holland, pp. 2111-2245.
Phillips, R. F. (2004). Estimation of a generalized random-effects
model: Some ECME algorithms and Monte Carlo evidence. Journal of
Economic Dynamics & Control 28, 1801-1824.
Phillips, R. F. (2010). Iterated feasible generalized least-squares
estimation of augmented dynamic panel data models. Journal of
Business & Economic Statistics 28, 410-422.
Phillips, R. F. (2012). On computing maximum-likelihood estimates of
the unbalanced two-way random-effects model. Communications in
Statistics–Simulation and Computation 41, 1921-1927.
Phillips, R. F. (2015). On quasi maximum-likelihood estimation of
dynamic panel data models. Economics Letters 137, 91-94.
Rao, C. R. (1973). Linear Statistical Inference and its
Applications. New York, NY: Wiley & Sons.
Roodman, D. (2009). A note on the theme of too many instruments.
Oxford Bulletin of Economics and Statistics 71, 135-158.
Ruud, P. A. (2000). An Introduction to Classical Econometric
Theory. New York, NY: Oxford University Press.
White, H. (2001). Asymptotic Theory for Econometricians. New
York, NY: Academic Press.
Wooldridge, J. M. (2010). Econometric Analysis of Cross
Section and Panel Data, 2nd ed. Cambridge, MA: MIT Press.