Preliminary Test Estimation in ULAN Models
Davy Paindaveinea,b, Joséa Rasoafaraniainaa and Thomas Verdebouta
a**Université libre de Bruxelles (ULB)
b**Toulouse School of Economics (TSE)
Abstract: Preliminary test estimation, which is a natural procedure when it is suspected a priori that the parameter to be estimated might take value in a submodel of the model at hand, is a classical topic in estimation theory. In the present paper, we establish general results on the asymptotic behavior of preliminary test estimators. More precisely, we show that, in uniformly locally asymptotically normal (ULAN) models, a general asymptotic theory can be derived for preliminary test estimators based on estimators admitting generic Bahadur-type representations. This allows for a detailed comparison between classical estimators and preliminary test estimators in ULAN models. Our results, that, in standard linear regression models, are shown to reduce to some classical results, are also illustrated in more modern and involved setups, such as the multisample one where m covariance matrices Σ1,…,Σm are to be estimated when it is suspected that these matrices might be equal, might be proportional, or might share a common “scale”. Simulation results confirm our theoretical findings.
Key words and phrases:
LAN models, Le Cam’s asymptotic theory, Multisample covariance matrix estimation, Preliminary test estimation.
1. Introduction
Preliminary test estimation is a widely studied topic in Statistics and Econometrics, that can be traced back to the seminal paper by Bancroft (1944). Preliminary test estimators are typically useful when one has to perform statistical inference with some “uncertain prior information”. More formally, assume that one is interested in estimating a parameter θ that belongs to some parameter space Θ⊂Rp, with the “uncertain prior information” that θ belongs to a given subset Θ0 of Θ (throughout, we assume that Θ is an open subset of Rp). Then, roughly speaking, the statistician may hesitate between (i) an unconstrained estimator θ^U with values in Θ or
(ii) a constrained estimator θ^C with values in Θ0 only. The idea underpinning preliminary test estimation is relatively simple: if a suitable test ϕn for H0:θ∈Θ0 against H1:θ∈/Θ0 did not reject the null hypothesis, then θ^C should be used; on the contrary, if ϕn provided evidence against H0, then the unconstrained estimator θ^U should be favoured. A preliminary test estimator based on the estimators θ^U and θ^C and on the test ϕn is therefore
[TABLE]
where I[A] stands for the indicator function associated with A and where ϕn=1 (resp., ϕn=0) indicates rejection (resp., non-rejection) of H0 by ϕn.
Since Bancroft (1944), preliminary test estimation has been an active research topic. Sen and Saleh (1979), Sen and Saleh (2006), Wan, Zou and Ohtani (2006) and Kibria and Saleh (2014) considered preliminary test estimation in regression models. Giles, Lieberman and Giles (1992) tackled the problem of selecting the size of the test ϕn when conducting preliminary test estimation in a misspecified regression model. Ohtani and Toyoda (1980) considered estimation of regression coefficients after a preliminary test for homoscedasticity. Preliminary test estimation in elliptical models has been considered in Arashi et al. (2014) and by Paindaveine, Rasoafaraniaina and Verdebout (2017) in a principal component analysis context. Preliminary test estimation has also been widely considered in time series models; see, e.g., Ahmed and Basu (2000), Maeyama, Tamaki and Taniguchi (2011), and the references therein. For a general overview of the topic, we refer to Giles and Giles (1993) and Saleh (2006).
Despite the many works on the topic, there does not seem to exist a general theory describing the asymptotic behavior of preliminary test estimators. The main objective of the present paper is therefore to derive such a general theory and to do so in a broad class of models (that will include in particular all models mentioned above). Assuming that the underlying model is regular in the sense that it is uniformly locally asymptotically normal (ULAN), we will derive the asymptotic behavior of a general preliminary test estimator; more precisely, we will consider preliminary test estimators based on estimators θ^U and θ^C that admit Bahadur-type representations. Our asymptotic results do cover many of the existing results in the literature but also allow us to consider more modern and involved models.
As expected, the asymptotic behavior of preliminary test estimators will depend on the true value of the parameter θ. We first show that when this true value is fixed outside Θ0, then, provided that the test ϕn is consistent, a preliminary test estimator is asymptotically equivalent in probability to the unconstrained estimator θ^U. Second, we show that when the true value of θ asymptotically belongs to contiguous regions of Θ0 (in a sense that is related to the asymptotic concept of contiguity, as we will make precise below), a preliminary test estimator exhibits an asymptotic behavior achieving a nice compromise between θ^U and θ^C.
The paper is organized as follows. In Section 2, we describe the assumptions that will be considered in the sequel. In Section 3, we state our asymptotic results and derive explicit forms for the asymptotic mean square error of preliminary test estimators based on asymptotically efficient estimators. In Section 4, we illustrate these general results in two particular setups. First, we show that, in a simple linear regression context, our results allow us to recover the classical results from Saleh (2006). Then, we consider preliminary test estimation of m covariance matrices in a multisample Gaussian setup. Preliminary test estimators associated with the constraints of covariance homogeneity, shape homogeneity and scale homogeneity are studied. Monte Carlo simulations confirm our theoretical results. Finally, an appendix collects the proofs.
2. ULAN models and Preliminary Test Estimators
As mentioned in the introduction, our objective is to derive the asymptotic behavior of preliminary test estimators (PTEs) in a very general context.
We will throughout assume that the underlying parametric model {Pθ(n):θ∈Θ⊂Rp} under investigation is uniformly locally and asymptotically normal (ULAN) in the following sense (throughout, all convergences are as n→∞).
Assumption A. There exists a sequence (νn) of full-rank non-random p×p matrices that is o(1) and a sequence (θn) in Θ with νn−1(θn−θ)=O(1) for some θ∈Θ such that for every sequence (τn) that is O(1) and satisfies θn+νnτn∈Θ for any n,
[TABLE]
under Pθ(n), where the random p-vector Δθ(n), still under Pθ(n), is asymptotically normal with mean vector 0 and covariance matrix Γθ.
An extensive list of models do satisfy Assumption A. This list includes hidden Markov models (Bickel and Ritov, 1996), quantum mechanics models (Kahn and Guta, 2009, Guta and Kiukas, 2015), time series models (Drost, Klaassen and Werker, 1997, Hallin et al., 1999, Francq and Zakoian, 2013), elliptical models (Hallin and Paindaveine, 2006, Hallin, Paindaveine and Verdebout, 2010), multisample elliptical models (Hallin and Paindaveine, 2008, Hallin, Paindaveine and Verdebout, 2013, Hallin, Paindaveine and Verdebout, 2014), models for directional data (Ley et al., 2013, Garcia-Portugues, Paindaveine and Verdebout, 2019), and many more.
As explained in the introduction, the construction of a PTE involves an unconstrained estimator θ^U taking values in Θ, a constrained estimator θ^C taking values in Θ0, and a test ϕn for H0:θ∈Θ0 against H1:θ∈/Θ0. Throughout, we will assume that Θ0 is a linear subspace of Rp of the form
[TABLE]
where θ0 is a fixed p-vector and M(Υ) denotes the vector subspace of Rp that is spanned by the columns of the p×r full-rank matrix Υ (r<p). We will restrict to the case θ0=0, which is without loss of generality (a reparametrization of the model always allows us to reduce to this case). Throughout, we will consider PTEs of the form
[TABLE]
that involve estimators θ^U, θ^C and a test ϕn satisfying the following assumption.
Assumption B. With νn and Δθ(n) as in Assumption A, there exists a random p-vector Sθ(n) for which
(Sθ(n)′,Δθ(n)′)′ is asymptotically normal with mean vector 0 and covariance matrix
[TABLE]
under Pθ(n) and such that, for some p×p matrix Aθ and r×p matrix Bθ,
νn−1(θ^U−θ)=AθSθ(n)+oP(1) under Pθ(n), θ∈Θ,
νn−1(θ^C−θ)=ΥBθSθ(n)+oP(1)
under Pθ(n), θ∈Θ0, and
ϕn rejects H0:θ∈Θ0 at asymptotic level α when
Q(n):=∥D(n)∥2>χp−r,1−α2,
where χℓ,β2 denotes the upper β-quantile of the χℓ2 distribution and where D(n) is such that D(n)=CθSθ(n)+oP(1) under Pθ(n), θ∈Θ0, for some p×p matrix
Cθ satisfying ΣθCθ′CθΣθCθ′CθΣθ=ΣθCθ′CθΣθ and tr[Cθ′CθΣθ]=p−r. Furthermore, P[Q(n)>χp−r,1−α2] converges to one under Pθ(n), θ∈/Θ0.
While being quite complex, Assumption B is extremely mild and, provided that the underlying model is ULAN as in Assumption A, merely only imposes that the unconstrained estimator θ^U admits a Bahadur-type representation. To show this, restrict to the usual contiguity rate νn=n−1/2Ip (extension to a general νn is direct) and assume that, under Pθ(n), θ∈Θ,
[TABLE]
where the random p-vectors Ti(n)=Ti(n)(θ), i=1,…,n are mutually independent and share a common distribution that has mean zero and has finite second-order moments (this ensures that Assumption B(i) holds, with Bθ:=Ip and Sθ(n):=n−1/2∑i=1nTi(n), say). Under very mild assumptions (needed to check the Levy-Lindeberg condition), the CLT for triangular arrays will then ensure that (Sθ(n)′,Δθ(n)′)′ is asymptotically normal under Pθ(n), as required in Assumption B.
Letting PΥ:=Υ(Υ′Υ)−1Υ′ be the matrix of the projection onto the constraint Θ0=M(Υ)∩Θ, the constrained estimator θ^C:=PΥθ^U readily satisfies
[TABLE]
under Pθ(n), θ∈Θ0, so that Assumption B(ii) is fulfilled, too (with Bθ:=(Υ′Υ)−1Υ′).
Finally, note that Assumption B(iii) will be satisfied by Wald tests for H0:θ∈Θ0 against H1:θ∈/Θ0 constructed in the usual way from (2.2). Wrapping up, the only key point in Assumption B is its part (i), which itself holds as soon as the unconstrained estimator θ^U, like, e.g., most M-, R-, and S-estimators, admits a Bahadur-type representation.
Now, in the ULAN framework of Assumption A, it should be noted that an asymptotically efficient (unconstrained) estimator θ^U, that is, an estimator that, under Pθ(n), θ∈Θ, is such that
[TABLE]
(see, e.g., Chapter 3 of Tanigushi and Kakizawa, 2000) also satisfies Assumption B(i), with Aθ=Γθ−1 and Sθ(n)=Δθ(n) (which provides Σθ=Γθ).
An asymptotically efficient constrained estimator θ^C, that is such that
[TABLE]
under Pθ(n), θ∈Θ0,
satisfies Assumption B(ii), with Bθ=(Υ′ΓθΥ)−1Υ′ and Sθ(n)=Δθ(n). For testing H0:θ∈Θ0 against H1:θ∈/Θ0, the locally asymptotically most stringent test rejects H0 at asymptotic level α when
[TABLE]
see, e.g., Chapter 5 of Ley and Verdebout (2017). Under Assumption A, it is easy to check that, under Pθ(n), θ∈Θ0,
[TABLE]
so that Assumption B(iii) then holds, still with Sθ(n)=Δθ(n), Σθ=Γθ, and with
Cθ=(Ip−Γθ1/2Υ(Υ′ΓθΥ)−1Υ′Γθ1/2)Γθ−1/2=(Ip−Γθ1/2ΥBθΓθ1/2)Γθ−1/2
(one can indeed check that Cθ′CθΓθCθ′Cθ=Cθ′Cθ and that
tr[Cθ′CθΓθ]=tr[Ip]−tr[(Υ′ΓθΥ)−1(Υ′ΓθΥ)]=p−r).
To summarize, Assumptions A and B cover many existing models and estimators. In the next section, our objective is to derive asymptotic results for PTEs in the general framework covered by these assumptions.
3. Asymptotic results
In this section, we derive the asymptotic behavior of a PTE of the form
[TABLE]
where the estimators θ^U, θ^C and the tests ϕn are such that Assumption B holds, under a parametric model {Pθ(n):θ∈Θ⊂Rp} that satisfies Assumption A. Letting λ(v):=I[v≤χp−r,1−α2], the estimator θ^PTE in (3.6) rewrites
[TABLE]
When deriving the asymptotic behavior of θ^PTE under Pθ(n), θ∈Θ, we will discriminate between three cases: (i) θ is fixed in the constraint Θ0, (ii) θ=θn belongs to the νn-vicinity of the constraint (that is, θn=θ+νnτn, with θ∈Θ0 and (τn)=O(1)), and (iii) θ is fixed outside the constraint Θ0; see Figure 1.
Our first result shows that, in case (iii), θ^PTE is asymptotically equivalent to the unconstrained estimator θ^U (see the appendix for a proof).
Theorem 1**.**
Let Assumptions A and B hold. Fix θ∈/Θ0 and assume that θ^C=OP(1) under Pθ(n). Then,
νn−1(θ^PTE−θ)=νn−1(θ^U−θ)+oP(1)
under Pθ(n).
We now move to cases (i)–(ii), where we will actually consider parameter sequences of the form θn=θ+νnτn∈Θ, with θ∈Θ0 and (τn)→τ (note that case (i) is obtained for τn≡0). We have the following result (see the appendix for a proof).
Theorem 2**.**
Let Assumptions A and B hold and consider sequences of the form θn=θ+νnτn∈Θ, with θ∈Θ0 and (τn)→τ. Conditional on D(n)=D, νn−1(θ^PTE−θn) is, under Pθn(n), asymptotically normal with mean vector
[TABLE]
and covariance matrix
[TABLE]
where we denoted as A− the Moore-Penrose inverse of A and where we let Lθ:=ΣθCθ′(CθΣθCθ′)−CθΣθ.
Theorem 2 allows us to obtain an expression for the unconditional asymptotic distribution of νn−1(θ^PTE−θn): in the framework of Assumption (B), the Le Cam third lemma implies that Dn is asymptotically normal with mean vector μD:=CθΩθτ and covariance matrix ΣD=CθΣθCθ′ under Pθn(n), so that, under the same sequence of hypotheses, νn−1(θ^PTE−θn) converges weakly to a random p-vector Z with probability density function (pdf)
[TABLE]
where ϕμ,Σ stands for the pdf of the p-variate normal distribution with mean vector μ and covariance matrix Σ. Since the pdf (3.9) does not allow for a simple comparison between θ^PTE, θ^U and θ^C, we will base such a comparison on the asymptotic mean square errors (MSEs) of these estimators.
A general expression for the asymptotic MSEs can be obtained by computing E[μPTEVic], Var[μPTEVic] and E[ΓPTEVic], recalling that, under Pθn(n), the random vector D(n) has asymptotic mean μD and covariance matrix ΣD. We now derive these limiting MSEs when PTEs are based on Q(n) in (2.5) and on asymptotically efficient estimators satisfying (2.3)–(2.4) (limiting MSEs of PTEs based on other estimators can be obtained in the same way). For such estimators and tests, Theorem 2 yields that, conditional on D(n)=D, νn−1(θ^PTE−θn) is, under Pθn(n), asymptotically normal with mean vector
[TABLE]
and covariance matrix
[TABLE]
with PΥ,eff:=Γθ1/2Υ(Υ′ΓθΥ)−1Υ′Γθ1/2 and PΥ,eff⊥:=Ip−PΥ,eff.
We then have the following result (see the appendix for a proof).
Proposition 1**.**
If μPTE,effVic in (3.10) is based on a random p-vector D that is normal with mean vector PΥ⊥Γθ1/2τ and covariance matrix PΥ⊥, then
[TABLE]
and
[TABLE]
where we let γj:=P[Vj≤χp−r,1−α2], with Vj∼χp−r+j2(τ′Γθ1/2PΥ,eff⊥Γθ1/2τ) (throughout, χℓ2(η) will stand for the non-central chi-square distribution with ℓ degrees of freedom and with non-centrality parameter η).
We define the limiting MSE of θ^PTE under Pθn(n) as
[TABLE]
where Z is the weak limit of νn−1(θ^PTE−θn) under Pθn(n).
Now, since
E[Z]=E[E[Z∣D]]=E[μPTE,effVic]
and
Var[Z]=E[Var[Z∣D]]+Var[E[Z∣D]]=ΓPTE,effVic+Var[μPTE,effVic]
(note that Var[Z∣D]=ΓPTE,effVic is non-random), Proposition 1 yields
[TABLE]
To enable proper comparison with the unconstrained and constrained antecedents of θ^PTE (namely, the estimators θ^U and θ^C satisfying (2.3) and (2.4), respectively), the following result provides explicit expressions for the limiting MSEs of these estimators (see the appendix for a proof).
Proposition 2**.**
Let Assumptions A and B hold. Then, under Pθn(n),
[TABLE]
and
[TABLE]
where θ^U and θ^C are estimators satisfying (2.3) and (2.4), respectively.
It is worthwile to consider some boundary cases.
For α=1, we have γ2=γ4=0, so that AMSEθ,τ(θ^PTE)=AMSEθ,τ(θ^U), which is compatible with the fact that θ^PTE=θ^U almost surely when the test ϕn is performed at asymptotic level α=1.
At the other extreme, for α=0, we rather have γ2=γ4=1, which provides
[TABLE]
in agreement with the fact that θ^PTE=θ^C almost surely when the test ϕn is performed at asymptotic level α=0.
To conclude this section, we offer a comparison between AMSEθ,τ(θ^PTE), AMSEθ,τ(θ^U), and AMSEθ,τ(θ^C). These limiting MSEs being matrix-valued, it is needed to base this comparison on a scalar summary, such as, e.g., their trace. In the present case, where the unconstrained estimator satisfies AMSEθ,τ(θ^U)=Γθ−1, it is natural to measure the asymptotic performance of an estimator θ^ through the equivalent scalar quantity
[TABLE]
which, for θ^U, will provide the “normalized” perfomance AMSEθ,τs(θ^U)=p (see Proposition 2), that does not depend on the value of θ at which the contiguous alternatives θn=θ+νnτn are localized.
Proposition 2 also entails that
[TABLE]
with δ:=PΥ,eff⊥Γθ1/2τ. Note that, at τ=0, this shows that AMSEθ,τs(θ^C)=r<p=AMSEθ,τs(θ^U), which confirms the intuition that θ^C dominates θ^U when the true parameter value belongs to Θ0. Now, it easily follows from (3.12) that
[TABLE]
where γj=P[Vj≤χp−r,1−α2], with Vj∼χp−r+j2(∥δ∥2). Figure 2 plots, for p=10, r=1 and α=.05, the quantities AMSEθ,τs(θ^U), AMSEθ,τs(θ^C) and AMSEθ,τs(θ^PTE) as functions of ∥δ∥2. The figure reveals that, under Pθ(n) with θ∈Θ0 (which corresponds to δ=0), the constrained estimator θ^C has the best performance, as expected. The PTE performs better than θ^U in the vicinity of the constraint (∥δ∥ small to moderate) and it is asymptotically equivalent to θ^U far from the constraint (∥δ∥ large).
4. Two specific applications
In this section, we illustrate the general results obtained above on two particular cases. First, we consider preliminary test estimation in the simple linear regression model and show that we recover for this model and for the considered estimation problem the classical results of Saleh (2006) (Section 4.1). Then, we consider the joint estimation of m covariance matrices Σ1,…,Σm in a context where it is suspected that these covariance matrices might be equal, might be proportional, or might share a common “scale” (Section 4.2).
4.1 Simple linear regression
Consider the simple linear regression model
[TABLE]
where Y=(Y1,…,Yn)′ is a response vector, x=(x1,…,xn)′ is a vector of non-random covariates, and where the error vector ϵ=(ϵ1,…,ϵn)′ is multinormal with mean zero and covariance matrix σ2In, for some σ2>0. This is the classical simple linear model with intercept ρ, slope β, and Gaussian homoscedastic errors with variance σ2. Throughout, we consider the parameter θ:=(ρ,β)′, as we will assume that σ2 is known (this is actually no restriction, since the block-diagonality of the Fisher information matrix in this model entails that replacing σ2 with a root-n consistent estimator will have no asymptotic cost, so that all results we obtain below extend to the case where σ2 would remain an unspecified nuisance). One can easily show that this model is ULAN, with a central sequence Δθ(n) that, under Pθ(n), is asymptotically normal with mean zero and covariance matrix
[TABLE]
where xˉ0:=limn→∞n−1∑i=1nxi and s0:=limn→∞sx(n) with sx(n):=n−1x′x−n−2(1n′x)2; of course, we tacitly assume that these limits exist and are finite. We consider here preliminary test estimation of θ when it is suspected that β=β0 for some given β0. In the context, the classical, unconstrained, estimator of θ is the maximum likelihood estimator
[TABLE]
whereas the natural constrained estimator would be θ^C:=(β0ρ~), with ρ~:=n−1(1n′Y−β01n′x). Since the locally asymptotically optimal test for H0:β=β0 against H1:β=β0 rejects the null hypothesis at asymptotic level α when
[TABLE]
the resulting PTE is given by
[TABLE]
Letting θ0=(β0ρ) be an arbitrary value of the parameter of interest corresponding to the constraint, the null hypothesis can be written as H0:θ∈θ0+M(Υ), with Υ:=(01). Since
[TABLE]
it follows from (3.12) that, under Pθ0+n−1/2τ(n), with τ=(δ0), the MSE quantity AMSEθ,τ(θ^PTE) is here given by
[TABLE]
where the γj’s are computed with p=2 and r=1. This is in perfect agreement with the result
in Theorem 4, p.p. 94–96 in Saleh (2006).
4.2 Multisample estimation of covariance matrices
Consider m(≥2) mutually independent samples of random k-vectors Xi1,…,Xini, i=1,…,m, with respective sample sizes n1,…,nm, such that, for any i, the Xij’s form a random sample from the multinormal distribution with mean vector 0 and (invertible) covariance matrix Σi (all results below extend to the case where observations in the ith sample would have a common, unspecified, mean μi, i=1,…,n, due to the block-diagonality of the Fisher information matrix for location and scatter in elliptical models; see, e.g., Hallin and Paindaveine, 2006). In the sequel, we decompose the covariance matrices into Σi=σi2Vi, where σi2:=(detΣi)1/k is their “scale” and Vi:=Σi/(detΣi)1/k is their “shape”. Under the only assumption that λi:=λi(n):=ni/n:=ni/(∑ℓ=1mnℓ)→λi∈(0,1) (to make the notation lighter, we will not stress the dependence in n in many quantities below), it follows from Hallin and Paindaveine (2009) that the sequence of Gaussian models indexed by
[TABLE]
where ve∘chV(∈Rdk, with dk:=k(k+1)/2−1) stands for the vector obtained by depriving vechV of its first entry V11, is ULAN in the sense of Assumption A. To describe the corresponding central sequence and Fisher information matrix, we need the following notation.
Denoting as er the rth vector of the canonical basis of Rk, we let Kk:=∑r,s=1k(eres′)⊗(eser′) be the k2×k2 commutation
matrix, put Jk:=(vecIk)(vecIk)′, and define Mk(V) as the (dk×k2) matrix such that (Mk(V))′(ve∘chv)=vecv for any symmetric k×k matrix v such that tr[V−1v]=0. We further put
[TABLE]
Then, letting Si:=ni−1∑j=1niXijXij′ be the empirical covariance matrix in sample i, the central sequence is Δθ=(ΔθI,1,…,ΔθI,m,ΔθII,1,…,ΔθII,m), where, for i=1,…,m, we wrote
[TABLE]
whereas the (full-rank and block-diagonal) information matrix takes the form
Γθ:=diag(ΓθI,ΓθII):=diag(2kσ−4,Hk(V)),
with σ:=diag(σ1,…,σm) and Hk(V):=diag(Hk(V1),…,Hk(Vm)).
The corresponding contiguity rate νn in Assumption A is given by
νn=n−1/2rn, where
[TABLE]
We consider here estimation of Σ1,…,Σm or, equivalently, estimation of θ. An advantage of the θ-parametrization is that it allows the construction of various PTEs: one may suspect, e.g., scale homogeneity H0scale:σ12=…=σm2, shape homogeneity H0shape:V1=…=Vm, or full covariance homogeneity H0cov:σ12V1=…=σm2Vm, that is, H0cov:Σ1=…=Σm. An asymptotically efficient unconstrained estimator in this Gaussian model is given by
[TABLE]
whereas, writing S:=n−1∑i=1m∑j=1niXijXij′ for the pooled covariance matrix estimator, asymptotically efficient constrained estimators, for the three constraints H0scale, H0shape and H0cov above, are given by
[TABLE]
[TABLE]
and
[TABLE]
respectively. The three hypotheses H0scale, H0shape and H0cov impose linear restrictions on θ, hence can be written as
[TABLE]
(more specifically, Υscale:=diag(1m,Imdk), Υshape:=diag(Im,1m⊗Idk) and Υcov:=diag(1m,1m⊗Idk)). Now, if the p×r matrix Υ stands for either of Υscale, Υshape or Υcov (of course, each constraint matrix has its own r), the locally asymptotically most stringent test ϕΥ(n) for H0:θ∈M(Υ) rejects the null hypothesis at asymptotic level α when
[TABLE]
This allows us to consider the PTEs
[TABLE]
[TABLE]
and
[TABLE]
To compare these PTEs with their unconstrained and constrained antecedents, we performed the following Monte Carlo exercise, that focuses on the case m=2, k=2 and n1=n2=20,000. We generated independently M=10,000 samples of mutually independent observations (X1,…,Xn1,Y1(ℓ),…,Yn2(ℓ)), ℓ=0,…,9, where the Xi’s are N(0,Ik) and the Yi,ℓ’s are N(0,Σℓ), with
[TABLE]
For ℓ=0, the samples X1,…,Xn1 and Y1(ℓ),…,Yn(ℓ) share the same underlying covariance matrix Ip, hence also the same scales and shapes, whereas ℓ=1,…,9 provide increasingly distinct scales and shapes. In other words, the constraints above are met for ℓ=0 and are increasingly violated for ℓ=1,…,9. For every considered estimator θ^ of the resulting true parameter value θ, we measure the performance of θ^ through
[TABLE]
where θ^(m) is an estimator computed in the mth replication (m=1,…,M), or rather, parallel to what we did in Section 3, through the scalar quantity AMSE^θs(θ^):=tr[ΓθAMSE^θ(θ^)]. Figure 3 then plots AMSE^s(θ^) for the PTEs θ^PTEscale, θ^PTEshape and θ^PTEcov (the corresponding tests are all performed at asymptotic level α=.05), as well as their constrained and unconstrained antecedents θ^Cscale, θ^Cshape, θ^Ccov and θ^U. To match what was done in Figure 2, these quantities are not plotted as functions of ℓ, but rather as functions of the induced ∥δ∥2. The figure also provides the corresponding asymptotic performance measures AMSEθ,τs(θ^) resulting from the general expression obtained in Section 3. Clearly, the results show that that these empirical and theoretical performance measures are in a perfect agreement.
Acknowledgements
Davy Paindaveine’s research is supported by a research fellowship from the Francqui Foundation and by the Program of Concerted Research Actions (ARC) of the Université libre de Bruxelles. Thomas Verdebout’s research is supported by the ARC Program of the Université libre de Bruxelles and by the Crédit de Recherche J.0134.18 of the FNRS (Fonds National pour la Recherche Scientifique), Communauté Française de Belgique.
Appendix: Proofs
In this appendix, we collect the proofs of the various results.
Proof of Theorem 1. First note that
[TABLE]
For any ε>0, Assumption A(iii) ensures that
[TABLE]
so that λ(Q(n))νn−1=oP(1) under Pθ(n). Since by assumption, θ^C−θ^U=θ^C−θ+oP(1)=OP(1) under Pθ(n), the result follows from (E.19).
□
Proof of Theorem 2.
Writing νn−1(θ^U−θn)=νn−1(θ^U−θ)−νn−1(θn−θ)
and νn−1(θ^C−θn)=νn−1(θ^C−θ)−νn−1(θn−θ),
Assumption B entails that
[TABLE]
under Pθ(n), θ∈Θ0. Using Assumption B again, we have
[TABLE]
under Pθ(n), θ∈Θ0, with
[TABLE]
Thus, the third Le Cam Lemma (jointly with the fact that (E.20) also holds under Pθn(n), from contiguity) directly yields that, under Pθn(n),
[TABLE]
where F~ is obtained from F by deleting its last column and last row.
Therefore, conditional on D(n)=D,
[TABLE]
under Pθn(n), where we let
[TABLE]
and
[TABLE]
with Lθ=ΣθCθ′(CθΣθCθ′)−CθΣθ. The result then follows from the fact that νn−1(θ^PTE−θn)=(1−λ(∥D∥2))νn−1(θ^U−θn)+λ(∥D∥2)νn−1(θ^C−θn) (by using the identities λ2(v)=λ(v), (1−λ(v))2=1−λ(v), and λ(v)(1−λ(v))=0).
□
The proof of Proposition 1 requires the following preliminary result.
Lemma 1** (Saleh (2006), pp. 32).**
Let Z be a Gaussian random p-vector with mean vector μ and covariance matrix Ip. Then, for any real measurable function φ,
[TABLE]
and
[TABLE]
where V∼χp+22(∥μ∥2) and W∼χp+42(∥μ∥2).
Proof of Proposition 1.
Since E[D]=PΥ,eff⊥Γθ1/2τ and since PΥ,eff is idempotent, we have
[TABLE]
Since PΥ,eff⊥ is a projection matrix with rank p−r, it decomposes into PΥ,eff⊥=OΛO′, where O is a p×p orthogonal matrix and Λ:=diag(1,…,1,0,…,0) is a diagonal matrix with tr[Λ]=p−r. The random vector E:=O′D is then Gaussian with mean vector ΛO′Γθ1/2τ and covariance matrix Λ. Lemma 1(i) thus entails that
[TABLE]
where γ2 is based on a non-central chi-square distribution with p−r+2 degrees of freedom and non-centrality parameter (ΛO′Γθ1/2τ)′ΛO′Γθ1/2τ=τ′Γθ1/2PΥ,eff⊥Γθ1/2τ.
Plugging this into (E.21) provides the result for E[μPTE,effVic].
We thus turn to Var[μPTE,effVic].
Since (1−λ(v))2=1−λ(v), we have
[TABLE]
where we used (E.22).
Now, by assumption, E[DD′]=Var[D]+E[D](E[D])′=PΥ,eff⊥+PΥ,eff⊥Γθ1/2ττ′Γθ1/2PΥ,eff⊥, and, applying Lemma 1(ii) along the same lines as above, we have that E[λ(∥D∥2)DD′]=OE[λ(∥E∥2)EE′]O′=γ2PΥ,eff⊥+γ4PΥ,eff⊥Γθ1/2ττ′Γθ1/2PΥ,eff⊥. Plugging these expressions into (4.2) then provides the result.
□
Proof of Proposition 2. Contiguity implies that (2.3) also holds under Pθn(n), so that
[TABLE]
under Pθn(n). Since Le Cam’s third lemma entails that Δθ(n) is asymptotically normal with mean vector Γθτ and covariance matrix Γθ under Pθn(n), it follows that νn−1(θ^U−θn) is asymptotically normal with mean vector 0 and covariance matrix Γθ−1 under Pθn(n), which yields AMSEθ,τ(θ^U)=Γθ−1.
Working along the same lines, we have that, under Pθn(n),
[TABLE]
It directly follows that νn−1(θ^C−θn) is, still under Pθn(n), asymptotically normal with mean vector Γθ−1/2PΥ,effΓθ1/2τ−τ=−Γθ−1/2PΥ,eff⊥Γθ1/2τ and covariance matrix Γθ−1/2PΥ,effΓθ−1/2. The expression for AMSEθ,τ(θ^C) given in Proposition 2 directly follows.
□