A New Model Variance Estimator for an Area Level Small Area Model to   Solve Multiple Problems Simultaneously

Masayo Yoshimori Hirose; Partha Lahiri

arXiv:1701.04176·math.ST·January 17, 2017

A New Model Variance Estimator for an Area Level Small Area Model to Solve Multiple Problems Simultaneously

Masayo Yoshimori Hirose, Partha Lahiri

PDF

Open Access

TL;DR

This paper introduces a novel model variance estimator for small area models that enhances EBLUP accuracy, prevents overshrinkage, and simplifies MSE estimation, promising improved inferences in small area estimation.

Contribution

It proposes a new model variance estimator that simultaneously improves shrinkage factor estimation, prevents overshrinkage, and simplifies MSE estimation in small area models.

Findings

01

The estimator improves shrinkage factor accuracy.

02

It prevents overshrinkage in EBLUP.

03

It simplifies MSE estimation without bias correction.

Abstract

The two-level normal hierarchical model (NHM) has played a critical role in the theory of small area estimation (SAE), one of the growing areas in statistics with numerous applications in different disciplines. In this paper, we address major well-known shortcomings associated with the empirical best linear unbiased prediction (EBLUP) of a small area mean and its mean squared error (MSE) estimation by considering an appropriate model variance estimator that satisfies multiple properties. The proposed model variance estimator simultaneously (i) improves on the estimation of the related shrinkage factors, (ii) protects EBLUP from the common overshrinkage problem, (iii) avoids complex bias correction in generating strictly positive second-order unbiased mean square error (MSE) estimator either by the Taylor series or single parametric bootstrap method. The idea of achieving multiple…

Tables5

Table 1. Table 1: Estimates of shrinkage factors B i subscript 𝐵 𝑖 B_{i} in 3 areas (minimum, median and max B i subscript 𝐵 𝑖 B_{i} values) in 1992 and 1993 SAIPE data

1992 year				1993 year
States	$D_{i}$	RE	HL	States	$D_{i}$	RE	HL
DC	31.6940	1.0000	0.9968	DC	38.2260	0.9574	0.9546
HI	11.3470	1.0000	0.9887	OR	12.1880	0.8775	0.8563
CA	1.8830	1.0000	0.7227	CA	2.1560	0.5588	0.4284

Table 2. Table 2: Estimates of MSEs in 3 areas (minimum, median and max B i subscript 𝐵 𝑖 B_{i} values) in 1992 and 1993 SAIPE data

1992 data
States	$D_{i}$	naive.RE	DL.RE	PB.RE	BL.RE	Taylor.HL	PB.HL
DC	31.69	1.81	1.91	1.80	1.19	2.08	2.07
HI	11.35	1.19	1.45	1.30	0.88	1.48	1.57
CA	1.88	1.26	2.82	1.34	1.20	1.72	1.37
1993 data
States	$D_{i}$	naive.RE	DL.RE	PB.RE	BL.RE	Taylor.HL	PB.HL
DC	38.23	4.07	4.23	4.14	4.97	4.41	4.33
OR	12.19	3.02	3.39	2.91	3.13	3.52	3.21
CA	2.16	1.64	2.19	1.74	1.72	1.87	1.60

Table 3. Table 3: RB and RRMSE of B ^ i subscript ^ 𝐵 𝑖 \hat{B}_{i} in 3 areas (min, median and max B i subscript 𝐵 𝑖 B_{i} )

		RB		RRMSE
States	$B_{i}$	RE	HL	RE	HL
DC	0.67	6.64	-2.86	28.70	28.49
ND	0.50	16.95	-5.28	50.29	41.96
OK	0.46	20.31	-6.09	56.90	44.79

Table 4. Table 4: RB and RRMSE of M ^ i subscript ^ 𝑀 𝑖 \hat{M}_{i} for MSE of EBLUP with REML in 3 areas (min, median and max B i subscript 𝐵 𝑖 B_{i} )

RB
States	$B_{i}$	naive.RE	DL.RE	PB.RE	PB.BL	Taylor.HL	PB.HL
DC	0.67	-10.10	1.52	-4.90	-2.01	4.31	3.83
ND	0.50	-17.50	3.39	-11.81	-6.57	-0.35	-2.63
OK	0.46	-14.94	10.48	-8.41	-2.51	4.43	1.96
RRMSE
States	$B_{i}$	naive.RE	DL.RE	PB.RE	PB.BL	Taylor.HL	PB.HL
DC	0.67	21.33	19.07	20.60	26.88	18.33	18.30
ND	0.50	25.51	10.64	22.54	29.28	12.57	15.48
OK	0.46	25.68	13.07	22.91	31.91	13.52	16.47

Table 5. Table 5: RB and RRMSE of M ^ i subscript ^ 𝑀 𝑖 \hat{M}_{i} for MSE of EBLUP with HL in 3 areas (min, median and max B i subscript 𝐵 𝑖 B_{i} )

RB
States	$B_{i}$	naive.RE	DL.RE	PB.RE	PB.BL	Taylor.HL	PB.HL
DC	0.67	-11.09	0.40	-5.95	-3.09	3.16	2.68
ND	0.50	-18.39	2.27	-12.76	-7.57	-1.43	-3.68
OK	0.46	-14.91	10.51	-8.38	-2.48	4.46	1.99
RRMSE
States	$B_{i}$	naive.RE	DL.RE	PB.RE	PB.BL	Taylor.HL	PB.HL
DC	0.67	21.64	18.81	20.66	26.68	17.90	17.90
ND	0.50	25.98	10.23	22.88	29.22	12.51	15.53
OK	0.46	25.67	13.10	22.91	31.92	13.54	16.48

Equations97

y_{i} = θ_{i} + e_{i} = x_{i}^{'} β + v_{i} + e_{i}, (i = 1, \dots, m),

y_{i} = θ_{i} + e_{i} = x_{i}^{'} β + v_{i} + e_{i}, (i = 1, \dots, m),

\hat{θ}_{i}^{B LU P} (A) = (1 - B_{i}) y_{i} + B_{i} x_{i}^{'} \hat{β} (A),

\hat{θ}_{i}^{B LU P} (A) = (1 - B_{i}) y_{i} + B_{i} x_{i}^{'} \hat{β} (A),

E [M_{i; a pp r o x} (\hat{A}_{M O M})] = M_{i} [\hat{θ}_{i}^{E B} (\hat{A}_{M O M})] + O (m^{- 1}),

E [M_{i; a pp r o x} (\hat{A}_{M O M})] = M_{i} [\hat{θ}_{i}^{E B} (\hat{A}_{M O M})] + O (m^{- 1}),

\hat{A}_{R E} = A \in [0, \infty) arg max L_{R E} (A),

\hat{A}_{R E} = A \in [0, \infty) arg max L_{R E} (A),

L_{R E} (A) = ∣ X^{'} V^{- 1} X ∣^{- \frac{1}{2}} ∣ V ∣^{- \frac{1}{2}} exp (- \frac{1}{2} y^{'} P y),

L_{R E} (A) = ∣ X^{'} V^{- 1} X ∣^{- \frac{1}{2}} ∣ V ∣^{- \frac{1}{2}} exp (- \frac{1}{2} y^{'} P y),

\hat{A}_{i} = A \in [0, \infty) arg max h_{i} (A) L_{R E} (A),

\hat{A}_{i} = A \in [0, \infty) arg max h_{i} (A) L_{R E} (A),

\mbox V a r (\hat{B}_{i}) = \frac{2 D _{i}^{2}}{( A + D _{i} ) ^{4} \mbox t r [ V ^{- 2} ]} + o (m^{- 1}),

\mbox V a r (\hat{B}_{i}) = \frac{2 D _{i}^{2}}{( A + D _{i} ) ^{4} \mbox t r [ V ^{- 2} ]} + o (m^{- 1}),

E (\hat{B}_{i})

E (\hat{B}_{i})

\frac{\partial B _{i}}{\partial A} \frac{\partial lo g h _{i} ( A )}{\partial A} + \frac{1}{2} \frac{\partial ^{2} B _{i}}{\partial A ^{2}} = 0.

\frac{\partial B _{i}}{\partial A} \frac{\partial lo g h _{i} ( A )}{\partial A} + \frac{1}{2} \frac{\partial ^{2} B _{i}}{\partial A ^{2}} = 0.

\frac{\partial lo g h _{i} ( A )}{\partial A} = \frac{1}{A + D _{i}} .

\frac{\partial lo g h _{i} ( A )}{\partial A} = \frac{1}{A + D _{i}} .

h_{i 0} (A) = (A + D_{i}) .

h_{i 0} (A) = (A + D_{i}) .

\hat{A}_{i; M G} = A \in [0, \infty) arg max \tilde{h}_{i} (A) L_{R E} (A),

\hat{A}_{i; M G} = A \in [0, \infty) arg max \tilde{h}_{i} (A) L_{R E} (A),

\hat{B}_{i; M G} = B_{i} (\hat{A}_{i; M G}), \hat{θ}_{i; M G}^{E B} = \hat{θ}_{i}^{B LU P} (\hat{A}_{i; M G}),

\hat{B}_{i; M G} = B_{i} (\hat{A}_{i; M G}), \hat{θ}_{i; M G}^{E B} = \hat{θ}_{i}^{B LU P} (\hat{A}_{i; M G}),

\hat{M}_{i; M G} \equiv M_{i; a pp r o x} (\hat{A}_{i; M G}) = g_{1 i} (\hat{A}_{i; M G}) + g_{2 i} (\hat{A}_{i; M G}) + g_{3 i} (\hat{A}_{i; M G}),

\hat{M}_{i; M G} \equiv M_{i; a pp r o x} (\hat{A}_{i; M G}) = g_{1 i} (\hat{A}_{i; M G}) + g_{2 i} (\hat{A}_{i; M G}) + g_{3 i} (\hat{A}_{i; M G}),

\hat{M}_{i; M G}^{b oo t} \equiv E_{*} [\hat{θ}_{i} (\hat{A}_{i; M G}^{*}, y^{*}) - θ_{i}^{*}]^{2},

\hat{M}_{i; M G}^{b oo t} \equiv E_{*} [\hat{θ}_{i} (\hat{A}_{i; M G}^{*}, y^{*}) - θ_{i}^{*}]^{2},

E [j = 1 \sum m (\hat{θ}_{j}^{J S} - θ_{j})^{2} ∣ θ] \leq E [j = 1 \sum m (y_{j} - θ_{j})^{2} ∣ θ], \forall θ \in R^{m},

E [j = 1 \sum m (\hat{θ}_{j}^{J S} - θ_{j})^{2} ∣ θ] \leq E [j = 1 \sum m (y_{j} - θ_{j})^{2} ∣ θ], \forall θ \in R^{m},

E (\hat{B}_{U} - B) = 0, E (\hat{B}_{pl ug} - B) = \frac{2}{m - p - 2} B = O (m^{- 1}),

E (\hat{B}_{U} - B) = 0, E (\hat{B}_{pl ug} - B) = \frac{2}{m - p - 2} B = O (m^{- 1}),

V (\hat{B}_{U}) = (\frac{m - p - 2}{m - p})^{2} V (\hat{B}_{pl ug}) \leq V (\hat{B}_{pl ug}) .

\hat{θ}_{i}^{E B} (\hat{A}_{M or r i s}) = (1 - \hat{B}_{U}) y_{i} + \hat{B}_{U} x_{i}^{'} \hat{β}_{o l s} .

\hat{θ}_{i}^{E B} (\hat{A}_{M or r i s}) = (1 - \hat{B}_{U}) y_{i} + \hat{B}_{U} x_{i}^{'} \hat{β}_{o l s} .

E [(\hat{θ}_{i}^{E B} (\hat{A}_{M or r i s}) - θ_{i})^{2}] \leq D .

E [(\hat{θ}_{i}^{E B} (\hat{A}_{M or r i s}) - θ_{i})^{2}] \leq D .

V (\hat{M}_{i, R E})

V (\hat{M}_{i, R E})

V [\hat{M}_{i; M G}]

a_{m}

a_{m}

b_{m}

\hat{b}_{1}

\hat{b}_{1}

\hat{h}_{2}

\hat{h}_{3}

\hat{B}_{i} (\hat{A}_{i, M G}) - \hat{B}_{i} (\hat{A}_{R E})

\hat{B}_{i} (\hat{A}_{i, M G}) - \hat{B}_{i} (\hat{A}_{R E})

= (\hat{A}_{i, M G} - A) \hat{b}_{1} + o_{p} (m^{- 1}) = - \frac{2 D _{i}}{t r [ V ^{- 2} ] ( A + D _{i} ) ^{3}} + o_{p} (m^{- 1});

E [B_{i} ∣ Y = y] = \hat{B}_{i} (\hat{A}_{R E}) + \frac{1}{2 m h ^ _{2}} (\hat{b}_{2} - \frac{h ^ _{3}}{h ^ _{2}} \hat{b}_{1}) + \frac{b ^ _{1}}{m h ^ _{2}} \overset{ρ}{^}_{1} + O_{p} (m^{- 2}) .

E [B_{i} ∣ Y = y] = \hat{B}_{i} (\hat{A}_{R E}) + \frac{1}{2 m h ^ _{2}} (\hat{b}_{2} - \frac{h ^ _{3}}{h ^ _{2}} \hat{b}_{1}) + \frac{b ^ _{1}}{m h ^ _{2}} \overset{ρ}{^}_{1} + O_{p} (m^{- 2}) .

\frac{1}{2 m h _{2}} (b_{2} - \frac{h _{3}}{h _{2}} b_{1}) + \frac{b _{1}}{m h _{2}} ρ_{1} = - \frac{2 D _{i}}{t r [ V ^{- 2} ] ( A + D _{i} ) ^{3}} .

\frac{1}{2 m h _{2}} (b_{2} - \frac{h _{3}}{h _{2}} b_{1}) + \frac{b _{1}}{m h _{2}} ρ_{1} = - \frac{2 D _{i}}{t r [ V ^{- 2} ] ( A + D _{i} ) ^{3}} .

ρ_{1} = \frac{\partial lo g π ( A )}{\partial A}

ρ_{1} = \frac{\partial lo g π ( A )}{\partial A}

= \frac{2}{A + D _{i}} - \frac{2 t r [ V ^{- 3} ]}{t r [ V ^{- 2} ]} .

π (A) \propto (A + D_{i})^{2} t r [V^{- 2}] .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsdemographic modeling and climate adaptation · Statistical Methods and Bayesian Inference · Spatial and Panel Data Analysis

Full text

A New Model Variance Estimator for an Area Level Small Area Model to Solve Multiple Problems Simultaneously

Masayo Yoshimori Hirose

The Institute of Statistical Mathematics

and

Partha Lahiri

Joint Program in Survey Methodology, University of Maryland,

College Park, U.S.A

Abstract

The two-level normal hierarchical model (NHM) has played a critical role in the theory of small area estimation (SAE), one of the growing areas in statistics with numerous applications in different disciplines. In this paper, we address major well-known shortcomings associated with the empirical best linear unbiased prediction (EBLUP) of a small area mean and its mean squared error (MSE) estimation by considering an appropriate model variance estimator that satisfies multiple properties. The proposed model variance estimator simultaneously (i) improves on the estimation of the related shrinkage factors, (ii) protects EBLUP from the common overshrinkage problem, (iii) avoids complex bias correction in generating strictly positive second-order unbiased mean square error (MSE) estimator either by the Taylor series or single parametric bootstrap method. The idea of achieving multiple desirable properties in an EBLUP method through a suitably devised model variance estimator is the first of its kind and holds promise in providing good inferences for small area means under the classical linear mixed model prediction framework. The proposed methodology is also evaluated using a Monte Carlo simulation study and real data analysis.

Keywords: Adjusted maximum likelihood method; Empirical Bayes; Empirical best linear unbiased prediction; Linear mixed model; Second-order unbiasedness.

1 Introduction

Planning and evaluation of government programs usually requires access to a wide range of national and sub-national socio-economic, environment and health related statistics. There is, however, a growing need for statistics relating to much smaller geographical areas where data are too sparse to support the sort of standard estimation methods typically employed at the national level. These small area official statistics are routinely used for a variety of purposes, including assessing economic well-being of a nation, making public policies, and allocating funds in various government programs. In this context, the term small area typically refers to a sub-population for which reliable statistics of interest cannot be produced using the limited area specific data available from the primary data source.

With the availability alternative data sources such as survey data, administrative and census records, different governmental agencies are now exploring ways to combine information from different data sources in order to produce reliable small area statistics. A common practice is to use a statistical model, usually a mixed model, and an efficient statistical methodology such as Bayesian or EBLUP for combining information from multiple databases. Such a strategy generally improves on estimation for a domain with small or no sample from the primary data source. We refer to the book by Rao and Molina (2015) for a comprehensive recent account of small area estimation literature.

Both classical and Bayesian methods and theories have been developed using the following widely applied two-level Normal hierarchical model:

A Two-Level Normal Hierarchical Model (NHM)

Level 1 (sampling model): $y_{i}|\theta_{i}\stackrel{{\scriptstyle\mathrm{ind}}}{{\sim}}N(\theta_{i},D_{i})$ ;

Level 2 (linking model): $\theta_{i}\stackrel{{\scriptstyle\mathrm{ind}}}{{\sim}}N(x_{i}^{\prime}\beta,A),$

for $i=1,\cdots,m.$

In the above model, level 1 is used to account for the sampling distribution of unbiased estimates $y_{i}$ . For example, $y_{i}$ could be a sample mean based on $n_{i}$ observations taken from the $i$ th population (e.g., a small geographic area, a hospital or a school.) As in other papers on the NHM (e.g., Efron and Morris 1973, 75; Fay and Herriot 1979; Morris 1983; Datta, Rao, and Smith, 2005), we assume that the sampling variances $D_{i}$ are known, in order to concentrate on the main issues. The assumption of known sampling variances $D_{i}$ often follows from the asymptotic variances of transformed direct estimates (Efron and Morris 1975; Carter and Rolph 1974) and/or from empirical variance modeling (Fay and Herriot 1979, Bell and Otto 1995).

Level 2 links the random effects $\theta_{i}$ to a vector of $p$ known auxiliary variables $x_{i}=(x_{i1},\cdots,x_{ip})^{\prime}$ , often obtained from various alternative data sources (e.g., administrative records, severity index for a hospital, school register, etc.). The parameters $\beta$ and $A$ of the linking model, commonly referred to as hyperparameters, are generally unknown and are estimated from the available data. We assume that $\beta\in R^{p},$ the $p$ -dimensional Euclidian space, and $A\in[0,\infty)$ .

The NHM model can be viewed as the following simple linear mixed model:

[TABLE]

where $\{v_{1}\ldots,v_{m}\}$ and $\{e_{1},\ldots,e_{m}\}$ are independent with $v_{i}{\sim}N(0,A)$ and $e_{i}{\sim}N(0,D_{i})$ ; $x_{i}$ is a $p$ -dimensional vector of known auxiliary variables; $\beta\in R^{p}$ is a $p$ -dimensional vector of unknown regression coefficients; $A\in[0,\infty)$ is an unknown variance component; $D_{i}>0$ is the known sampling variance of $y_{i}\;(i=1,\cdots,m)$ .

NHM is particularly effective in combining different sources of information and explaining different sources of errors. Some earlier applications of NHM include the estimation of: (i) false alarm probabilities in New York city (Carter and Rolph 1974), (ii) the batting averages of major league baseball players (Efron and Morris 1975), and (iii) prevalence of toxoplasmosis in El Salvador (Efron and Morris 1975).

Since the publication of the landmark paper by Fay and Herriot (with 971 google citation to date), the NHM, commonly known as the Fay-Herriot (FH) model in the small area research community, has been extensively used in developing small area estimation theory and in a wide range of applications. In a small area estimation setting, NHM or the FH was used: to estimate poverty rates for the US states, counties, and school districts (Citro and Kalton 2000) and Chilean municipalities (Casas-Codero et al. 2015), and to estimate proportions at the lowest level of literacy for states and counties (Mohadjer et al. 2007).

The MSE of a given predictor $\hat{\theta}_{i}$ of $\theta_{i}$ is defined as $M_{i}(\hat{\theta}_{i})=E(\hat{\theta}_{i}-\theta_{i})^{2}$ , where the expectation is with respect to the joint distribution of $y=(y_{1},\cdots,y_{m})^{\prime}$ and $\theta=(\theta_{1},\cdots,\theta_{m})^{\prime}$ under the Fay–Herriot model (1). The best linear unbiased predictor (BLUP) $\hat{\theta}_{i}^{BLUP}$ of $\theta_{i}$ , which minimizes $M_{i}(\hat{\theta}_{i})$ among all linear unbiased predictors $\hat{\theta}_{i}$ , is given by:

[TABLE]

where $B_{i}\equiv B_{i}(A)=D_{i}/(A+D_{i})$ is the shrinkage factor and $\hat{\beta}(A)=(X^{\prime}{V}^{-1}X)^{-1}X^{\prime}{V}^{-1}y$ is the weighted least square estimator of $\beta$ when $A$ is known. Here we use the following notation: $X^{\prime}=(x_{1},\cdots,x_{m}),$ a $p\times m$ matrix of known auxiliary variables; $V=\mbox{diag}(A+D_{1},\cdots,A+D_{m}),$ a $m\times m$ diagonal matrix. By plugging in an estimator $\hat{A}$ for $A$ (e.g., ML, REML, ANOVA) in the BLUP, one gets an empirical BLUP (EBLUP): $\hat{\theta}_{i}^{EB}\equiv\hat{\theta}_{i}^{BLUP}(\hat{A})$ .

In the context of an empirical Bayesian approach, Morris (1983) noted that for making inferences about $\theta_{i}$ , estimation of $B_{i}$ is more important than that of $A$ because the posterior means and variances of $\theta_{i}$ are linear in $B_{i}$ , not in $A$ . He also noted that, even if an exact unbiased estimator of $A$ is plugged in $B_{i}\equiv B_{i}(A)$ , one may estimate $B_{i}$ with large bias. For that reason, to motivate the James-Stein estimator of $\theta_{i}$ , Efron and Morris (1973) used an exact unbiased estimator of $B$ and not maximum likelihood estimator of $A$ . For small $m$ , maximum likelihood estimator of $A$ (even with the REML correction) frequently produces estimate of $A$ at the boundary (that is, 0) resulting in $B_{i}=1$ for all $i$ , even when some of the true $B_{i}$ are not close to 1. This causes an overshrinkage problem in EBLUP. That is, for each $i$ , EBLUP of $\theta_{i}$ reduces to the regression estimator. To overcome the overshrinkage problem, Morris (1983) suggested the fraction $(m-p-2)/(m-1)$ when estimator of $B_{i}$ is 1. Li and Lahiri (2010) and Yoshimori and Lahiri (2014) avoided the overshrinkage problem by considering strictly positive consistent estimators of $A$ , but did not devise their estimators of $A$ to obtain nearly accurate estimator of $B_{i}$ ; that is, biases of their estimators of $B_{i}$ , like all other existing estimators (e.g., ML or REML), are of the order $O(m^{-1})$ and not $o(m^{-1})$ . This is an important research gap, which we will fill in this paper.

An estimator $\hat{M}_{i}(\hat{\theta}_{i}^{EB})$ of $M_{i}(\hat{\theta}_{i}^{EB})$ is called second-order unbiased if $E[\hat{M}_{i}(\hat{\theta}_{i}^{EB})]=M_{i}(\hat{\theta}_{i}^{EB})+o(m^{-1}),$ for large $m$ , under suitable regularity conditions. Let $M_{i;approx}(A)$ be a second-order approximation to $M_{i}(\hat{\theta}_{i}^{EB}).$ That is, $M_{i}(\hat{\theta}_{i}^{EB})=M_{i;approx}(A)+o(m^{-1}),$ for large $m$ , under regularity conditions. Prasad and Rao (1990) proposed a second-order unbiased estimator of $M_{i}(\hat{\theta}_{i;MOM}^{EB})$ , where $\hat{\theta}_{i;MOM}^{EB}$ is EBLUP of $\theta_{i}$ when method-of-moments (MOM) estimator $\hat{A}_{MOM}$ of $A$ is used. They noticed that the simple plugged-in estimator $M_{i;approx}(\hat{A}_{MOM})$ is not second-order unbiased estimator of $M_{i}(\hat{\theta}_{i;MOM}^{EB})$ . They showed that

[TABLE]

for large $m$ , under regularity conditions. In fact, $M_{i;approx}(\hat{A})$ is not second-order unbiased estimator of $M_{i}(\hat{\theta}_{i}^{EB})$ for any variance component estimators proposed in the literature. Bias correction is usually applied to achieve second-order unbiasedness. However, some bias-correction can even yield negative estimates of MSE. See Jiang (2007) and Molina and Rao (2015) for further discussions.

Mimicking a Bayesian hyperprior calculation, Laird and Louis (1987) introduced a parametric bootstrap method for measuring uncertainty of an empirical Bayes estimator. While their point estimator is identical to EBLUP, their measure of uncertainty has more of a Bayesian flavor rather than MSE. Butar (1997) [see also Butar and Lahiri 2003] was the first to introduce parametric bootstrap method to produce a second-order unbiased MSE estimator in the small area estimation context. Since Butar’s work, a number of papers on parametric bootstrap MSE estimation methods appeared in the SAE literature; see Pfeffermann and Glickman (2004), Chatterjee and Lahiri (2007); Hall and Maiti (2006); Pfefferman and Correra (2012). Some of them are the second-order unbiased but not strictly positive. Some adjustments were proposed to make the second-order unbiased double parametric bootstrap MSE estimators strictly positive, but adjusted MSE estimators were not claimed to have the dual property of second-order unbiasedness and strict positivity. As pointed out in Jiang et al. (2016), a proof is not at all trivial and it is not even clear if the adjustments for positivity retain the second-order unbiasedness of the MSE estimators.

In this paper, we focus on the estimation of two important area-specific functions of $A$ — the shrinkage factor $B_{i}$ and the MSE of the EBLUP $M_{i}(\hat{\theta}_{i}^{EB})$ . We propose a single area specific estimator of $A$ , say $\hat{A}_{i},$ that simultaneously satisfies the following multiple desirable properties under certain mild regularity conditions:

Property 1: Obtain a second-order unbiased estimator of $B_{i}$ , that is, $E(\hat{B}_{i})=B_{i}+o(m^{-1})$ , among the class of estimators of $B_{i}$ with identical variance, up to the order $O(m^{-1})$ , where $\hat{B}_{i}=D_{i}/(\hat{A}_{i}+D_{i})$ .

Property 2: $0<\mbox{inf}_{m\geq 1}\hat{B}_{i}\leq\mbox{sup}_{m\geq 1}\hat{B}_{i}<1$ . That is, it protects EBLUP from overshrinking to the regression estimator, a common problem encountered in the EB method;

Property 3: Obtain second-order unbiased Taylor series MSE estimator of EBLUP without any bias correction; that is, $E[M_{i;approx}(\hat{A}_{i})]=M_{i}(\hat{\theta}_{i}^{EB})+o(m^{-1}).$

Property 4: Produce a strictly positive second-order unbiased single parametric bootstrap MSE estimator without any bias-correction.

Note that the variance component in the FH model (1) is not area specific, but to satisfy the above properties simultaneously for a given area, we propose an area specific estimator of $A$ . This introduces an area specific bias, but interestingly the order of bias is $O(m^{-1})$ , same as the bias of the ML estimator of $A$ but higher than that of REML in the higher-order asymptotic sense. This seems to be a reasonable approach as our main targets are area specific parameters and not the global parameter $A$ . Obviously, if $A$ is the main target, we would recommend a standard variance component method. We stress that in general none of the existing methods for estimating $A$ satisfies any of all the four properties simultaneously.

In Section 2, we propose a new adjusted maximum likelihood estimator of $A$ that satisfies all the four desirable properties listed above. The balanced case has been heavily studied in the literature. We consider the balanced case in Section 3 and show how our results are related to the ones in the literature. In Section 4, using a real life data from the U.S. Census Bureau, we demonstrate superior performances of our proposed estimators and MSE estimators over the competing estimators. A Monte Carlo simulation study, described in Section 5, shows that the proposed estimators outperform competing estimators. All the technical proofs are deferred to the Appendix.

2 A New Adjusted Maximum Likelihood Estimator of $A$

The residual maximum likelihood estimator of $A$ is defined as:

[TABLE]

where $L_{RE}(A)$ is the residual likelihood of $A$ given by

[TABLE]

with $P=V^{-1}-V^{-1}X(X^{\prime}V^{-1}X)^{-1}X^{\prime}V^{-1}$ . Note that $\hat{A}_{RE}$ does not satisfy any of the four desirable properties listed in the introduction.

In an effort to find a likelihood-based estimator of $A$ that satisfies all the four desirable properties, we define the followed adjusted maximum likelihood estimator of $A$ :

[TABLE]

where $h_{i}(A)$ is a factor to be suitably chosen so that all the four desirable properties are satisfied.

We first find $h_{i}(A)$ so that the resulting estimator of $A$ results in a nearly unbiased estimator of $B_{i}$ that also protects EBLUP from overshrinking. In other words, we first find the adjustment factor $h_{i}(A)$ that simultaneously satisfies Properties 1 and 2. Interestingly, it turns out that such a adjusted maximum likelihood estimator also satisfies Properties 3 and 4.

Using Lemma 1 in Appendix A and Taylor series expansion, we have

[TABLE]

for large $m$ . We restrict ourselves to the class of estimators of $A$ that satisfies (2).

Using Lemma 1 and Taylor series expansion, we have

[TABLE]

Thus, Property 1 is satisfied if we have

[TABLE]

Now the differential equation (12) simplifies to:

[TABLE]

Thus, an adjustment factor that satisfies (5) is given by

[TABLE]

This adjustment factor is indeed the unique solution to (12) up to the order $O(m^{-1})$ . Let $\hat{A}_{i0}$ be the adjusted maximum likelihood estimator of $A$ for the choice $h_{i}(A)=h_{i0}(A).$ We note that $\hat{A}_{i0}$ is not strictly positive. To achieve strict positivity, we propose our final estimator of $A$ as:

[TABLE]

where $\tilde{h}_{i}(A)=h_{+}(A)h_{i0}(A)$ with the additional adjustment $h_{+}(A)$ satisfying regularity conditions R4 and R6-R7.

Our proposed estimator of $B_{i}$ and EBLUP are given by

[TABLE]

respectively.

Unlike the common practice, we avoid bias correction in obtaining both Taylor series and parametric bootstrap MSE estimators of our proposed EBLUP. Interestingly, our approach ensures the important dual property of MSE estimator — second-order unbiasedness and strict positivity. This kind of MSE estimators is the first of its kind in the small area estimation literature.

We obtain our Taylor series estimator of MSE of EBLUP by simply plugging in the proposed estimator $\hat{A}_{i;MG}$ for $A$ in the second-order MSE approximation $M_{approx}(A)$ and is given by:

[TABLE]

Our proposed parametric bootstrap MSE estimator retains the simplicity of bootstrap originally intended in Efron (1979). It is given by

[TABLE]

where ${\theta}_{i}^{*}=x_{i}^{\prime}\hat{\beta}(\hat{A}_{1;MG},\cdots,\hat{A}_{m;MG})+v^{*}_{i}$ with $v^{*}_{i}\sim N(0,\hat{A}_{i;MG})$ . Note that the new bootstrap MSE estimator does not require any bias correction.

The following theorem states that our proposed adjusted maximum likelihood estimator of $A$ satisfies all the four desirable properties.

Theorem 1.

Under the regularity conditions $R1-R7$ , we have, for large $m$ ,

(i) $Bias(\hat{B}_{i;MG})=o(1);\;Var(\hat{B}_{i;MG})=\frac{2D_{i}^{2}}{(A+D_{i})^{4}\mbox{tr}[V^{-2}]}+o(m^{-1});$ **

(ii) $0<\mbox{inf}_{m\geq 1}\hat{B}_{i;MG}\leq\mbox{sup}_{m\geq 1}\hat{B}_{i;MG}<1$ , for $m>p+2$ ;

(iii) $E(\hat{M}_{i;MG})-M_{i}(\hat{\theta}_{i;MG}^{EB})=o(m^{-1})$ ;

(iv) $E(\hat{M}_{i;MG}^{boot})-M_{i}(\hat{\theta}_{i;MG}^{EB})=o(m^{-1}).$

For proof of Theorem 1, see Appendix B.

3 The balanced case: $D_{i}=D,\;i=1,\cdots,m$

In this section, we show how the proposed adjusted maximum likelihood estimator of $A$ is related to the problem of simultaneous estimation of several independent normal means, a topic for intense research activities, especially in the 60’s, 70’s and 80’s, since the introduction of the celebrated James-Stein estimator (James and Stein 1961).

Let $y_{i}|\theta_{i}\stackrel{{\scriptstyle ind}}{{\sim}}N(\theta_{i},1),\;i=1,\cdots,m$ . James and Stein (1961) showed that for $m\geq 3$ the maximum likelihood (also unbiased) estimator of $\theta_{i}$ is inadmissible under the sum of squared error loss function $L(\hat{\theta},\theta)=\sum_{j=1}^{m}(\hat{\theta}_{j}-\theta_{j})^{2}$ and is dominated by the James-Stein estimator: $\hat{\theta}_{i}^{JS}=(1-\hat{B}_{JS})y_{i}$ , where $\hat{B}_{JS}={(m-2)}/{\sum_{j=1}^{m}y_{j}^{2}}.$ That is,

[TABLE]

where $R^{m}$ is the $m$ -dimensional Euclidean space, with strict inequality holding for at least one point $\theta$ . The dominance result, however, does not hold for individual components.

Efron and Morris (1973) offered an empirical Bayesian justification of the James-Stein estimator under the prior $\theta_{i}\stackrel{{\scriptstyle iid}}{{\sim}}N(0,A),\;i=1,\cdots,m.$ Their model is indeed a special case of two level normal hierarchical model with $D_{i}=1,\;x_{i}^{\prime}\beta=0,\;i=1,\cdots,m,$ and thus the James-Stein estimator of $\theta_{i}$ can be also viewed as an EBLUP.

Morris (1983) discussed an empirical Bayesian estimation of $\theta_{i}$ for a Bayesian model that is equivalent to the balanced case of NHM, that is, when $D_{i}=D$ implying $B_{i}=B,\;i=1,\cdots,m.$ In this case, he noted that $\hat{B}_{U}={(m-p-2)D}/{S}$ is an exact unbiased estimator of $B$ , using the fact that, under NHM, $S=\sum_{j=1}^{m}(y_{j}-x_{j}^{\prime}\hat{\beta}_{ols})^{2}\sim(D+A)\chi^{2}_{m-p},$ where $\hat{\beta}_{ols}$ is the ordinary least square estimator of $\beta$ . We can write $\hat{B}_{U}\equiv B(\hat{A}_{Morris})={D}/{(D+\hat{A}_{Morris})}$ , where $\hat{A}_{Morris}={S}/{(m-p-2)}-D$ . One can alternatively estimate $B$ by a simple plug-in estimator: $\hat{B}_{plug}\equiv B(\hat{A}_{U})={D}/{(D+\hat{A}_{U})}$ , where $\hat{A}_{U}={S}/{(m-p)}-D$ is an unbiased estimator of $A$ . Note that for $m>p+4$

[TABLE]

Thus, $\hat{B}_{U}$ is better than $\hat{B}_{plug}$ both in terms of bias and variance properties. We can write $\hat{B}_{U}=\hat{B}_{plug}{(m-p-2)}/{(m-p)}.$ As pointed out by Morris (1983), the factor $(m-p-2)/(m-p)$ helps correct for the curvature dependence of $B$ on $A$ .

Consider the following empirical Bayes estimator (same as EBLUP) of $\theta_{i}$ :

[TABLE]

In this case, exact MSE and exact unbiased estimator of MSE can be obtained. Componentwise, for $m\geq p+3$ , we have

[TABLE]

Thus, $\hat{\theta}_{i}^{EB}(\hat{A}_{Morris})$ dominates $y_{i}$ in terms of unconditional MSE for $m\geq p+3$ . Such a componentwise dominance property, however, does not hold for conditional MSE (conditional on $\theta$ ); see Morris (1983) for details.

Since $B<1$ , using Stein’s argument, Morris (1983) suggested the following estimator of $B$ : $\hat{B}_{Morris}={D}/{(D+\hat{A}_{Morris}^{+})},$ where $\hat{A}_{Morris}^{+}={S}/{(m-p-2)}-D$ if $S>(m-p-2)D$ and $\hat{A}_{Morris}^{+}={2D}/{(m-p-2)}$ otherwise. This improves the estimation of both $B$ and $\theta_{i}$ . It is straightforward to show that in this special case $\hat{A}_{Morris}^{+}$ satisfies all the four properties. Moreover, under the regularity condition R6-R8 and $m>p+2$ , $\hat{A}_{MG}$ , our proposed estimator of $A$ , is unique (see Appendix C for a proof) and is equivalent to $\hat{A}_{Morris}^{+}$ in the higher-order asymptotic sense, that is, $E(\hat{A}_{MG}-\hat{A}_{Morris}^{+})=o(m^{-1})$ .

Let $\hat{\theta}_{i}^{EB}=\hat{\theta}_{i}^{EB}(\hat{A})$ denote an EBLUP of $\theta_{i}$ , where $\hat{A}$ could be $\hat{A}_{MG},\;\hat{A}_{Morris}^{+}$ or the REML $\hat{A}_{RE}=\mbox{max}(0,\hat{A}_{U}).$ We can write $M_{i;approx}(A)=g_{1}(A)+g_{2}(A)+g_{3}(A)$ as the second-order approximation to $M_{i}(\hat{\theta}_{i}^{EB})=MSE(\hat{\theta}_{i}^{EB})$ for any of the three choices of the estimator of $A$ . The traditional second-order unbiased MSE estimator is obtained by correcting bias of $M_{i;approx}(\hat{A}_{RE})$ , up to the order $O(m^{-1})$ . It is given by $\hat{M}_{i,RE}=g_{1}(\hat{A}_{RE})+g_{2}(\hat{A}_{RE})+2g_{3}(\hat{A}_{RE})$ ; see Prasad and Rao (1990), Datta and Lahiri (2000), Das et al. (2004). In this paper, we suggest an alternative second-order unbiased MSE estimator without bias-correction, that is, $\hat{M}_{i;MG}=g_{1}(\hat{A}_{MG})+g_{2}(\hat{A}_{MG})+g_{3}(\hat{A}_{MG})$ .

We can show that

[TABLE]

where

[TABLE]

It is straightforward to check that for $m>p+4$ and $p\geq 3$ , $b_{m}\leq a_{m}$ . Thus, in the higher-order asymptotic sense, $\hat{M}_{i;MG}$ is a better second-order unbiased estimator of $M_{i}(\hat{\theta}_{i}^{EB})$ than $\hat{M}_{i,RE}.$

4 A Connection to the Bayesian Approach

In this section, we suggest a Bayesian method that is close to our proposed EBLUP in certain higher-order asymptotic sense. To this end, we seek a prior on the hyperparameters $(\beta,A)$ that satisfies all the following properties simultaneously:

(i)

$E[B_{i}|Y=y]=\hat{B}_{i,MG}+o_{p}(m^{-1})$ ;

(ii)

$V[B_{i}|Y=y]=Var(\hat{B}_{i;MG})+o_{p}(m^{-1})$ ;

(iii)

$E[\theta_{i}|Y=y]=\hat{\theta}_{i,MG}+o_{p}(m^{-1})$ ;

(iv)

$V[\theta_{i}|Y=y]=\hat{M}_{i,MG}+o_{p}(m^{-1})$ ;

(v)

$V[\theta_{i}|Y=y]=\hat{M}_{i;MG}^{boot}+o_{p}(m^{-1})$ .

First assume the following prior for $(\beta,A)$ : $p(\beta,A)\propto\pi(A),\;\beta\in R^{p},\;A>0$ . We first find a prior $\pi(A)$ satisfying property (i). To this end, following Datta et al. (2005), we first introduce the following notations:

[TABLE]

where $\hat{A}_{RE}$ is the residual maximum likelihood estimator of $A$ , $l_{RE}$ is the logarithm of residual likelihood, and $V=\mbox{diag}(A+D_{1},\cdots,A+D_{m}).$

We have

[TABLE]

It is interesting to note that (11) is given by (21) in Datta et al. (2005) with $b(A)=B_{i}(A)$ . Hence, we seek $\rho_{1}$ satisfying the following differential equation:

[TABLE]

The equation (12) can be written as follows (up to $O_{p}(m^{-1}))$ ;

[TABLE]

A solution to differential equation (13) is given by;

[TABLE]

It is straightforward to check that the prior (14) satisfies rest of the conditions (ii)-(v). Interestingly, this prior is same as the prior suggested by Datta et al. (2005). For the balanced case, the prior reduces to the Stein’s harmonic prior.

5 SAIPE data analysis

For purposes of evaluation, we consider the problem of estimating the percentages of school-age (aged 5-17) children in poverty for the fifty states and the District of Columbia using the same data set considered by Bell (1999). We choose two years (1992 and 1993) of state level data from the U.S. Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) program. In 1992, the REML estimate of A is zero while in year 1993 it is positive. Thus, these years would provide two different scenarios for evaluating estimation methods.

We assume the standard SAIPE state level model in which survey-weighted estimates of the percentages of 5-17-year-old (related) children in poverty follow the Fay-Herriot model (1). The survey-weighted percentages are obtained using the Current Population Survey (CPS) data with their sampling variances $D_{i}$ estimated by a Generalized Variance Function (GVF) method, following Otto and Bell (1995). However, as in any data analysis that use the Fay-Herriot model, we assume the sampling variances to be known throughout the estimation procedure. We use the same state level auxiliary variables $x$ (a vector of length 5, i.e., $p=5$ ), obtained from the Internal Revenue Service (IRS) data, food stamp data and census residual data that the SAIPE program used for the problem.

Table 1 displays REML and our proposed estimates (HL) of the shrinkage parameters $B_{i}$ for Washington DC (DC), Hawaii (HI) and California (CA) for the year 1992 and DC, Oregon (OR) and CA for the year 1993. They have the largest, median and smallest sampling variances $D_{i}$ among all the states and DC, respectively. For 1992, REML estimate of $A$ is zero yielding a $B_{i}$ estimate of 1 for all the states and DC. This overshrinkage problem reduces EBLUPs for all the states to regression synthetic estimates. Thus, even for states with reliable direct estimates (e.g., CA), there is no contribution of direct estimates in the EBLUP formula. Our proposed estimates of shrinkage parameters offer a sensible solution. For DC, our shrinkage estimate is very close to 1 (giving nearly zero weight to the survey-weighted direct estimate in the EBLUP formula), but for California survey estimate gets considerable weight (about $28\%$ ). In 1993, we do not have overshrinkage problem for REML estimates of the shrinkage factors, but our proposed estimates of $B_{i}$ always gives more weights to the survey-weighted direct estimates than the corresponding REML estimates. Both REML and proposed estimates of $B_{i}$ for all the states and DC are displayed in the left panel of Figure 1. Overall, our proposed estimates of $B_{i}$ are more conservative than REML.

Table 2 displays different MSE estimates of EBLUPs for the selected three states for both years. The right panel of Figure 1 displays different MSE estimates for all the states in both years. For this study, we included the following MSE estimators of EBLUP:

(a) Naive MSE estimator (naive.RE) given by $g_{1i}(\hat{A}_{RE})+g_{2i}(\hat{A}_{RE})$ , where $\hat{A}_{RE}$ denotes the REML estimator of $A$ . This MSE estimator neither incorporates the extra uncertainty due to the estimation of $A$ nor adjusts bias of the estimator $g_{1i}(\hat{A}_{RE})$ and is not second-order unbiased;

(b) Single parametric bootstrap MSE estimator (PB.RE) that is obtained from (7) when REML estimator of $A$ is used in the EBLUP formula and is not a second-order unbiased.

(c) Two second-order unbiased MSE estimators based on Taylor-series:

(i) DL.RE: $g_{1i}(\hat{A}_{RE})+g_{2i}(\hat{A}_{RE})+2g_{3i}(\hat{A}_{RE})$ ; see Datta and Lahiri (2000).

(ii) Taylor.HL: the proposed Taylor series MSE estimator given by (6).

(d) Two second-order unbiased single parametric bootstrap MSE estimators:

(i) BL.RE: $\;2\{g_{1i}(\hat{A}_{RE})+g_{2i}(\hat{A}_{RE})\}-E_{*}[g_{1i}(\hat{A}_{RE}^{*})+g_{2i}(\hat{A}_{RE}^{*})]\\ +E_{*}[\{\hat{\theta}_{i}^{*}(y_{i},\hat{A}_{RE}^{*},\hat{\beta}(\hat{A}_{RE}^{*},y_{i}))-\tilde{\theta}_{i}^{*}(y_{i},\hat{A}_{RE},\hat{\beta}(\hat{A}_{RE},y_{i}))\}^{2}]$ ; see Butar and Lahiri (2003).

(ii) PB.HL: our proposed single parametric bootstrap MSE estimator given by (7).

For this application, there is no appreciable difference between the naive MSE estimates and MSE estimates that attempt to capture additional variability due to the estimation of $A$ . In most of the cases, naive MSE estimates are slightly lower than both the first-order and second-order MSE estimates. The first-order unbiased MSE estimates (PB.RE) are generally slightly smaller than the second-order unbiased MSE estimates. The PB.BL MSE estimates can take negative values because of the adjustment needed to make it second-order unbiased. Except for large states (e.g., CA), MSE estimates for EBLUPs are considerably lower than the corresponding sampling variances $D_{i}$ indicating possible improvements by EBLUPs over the direct estimates.

For the year 1992, REML estimate of $A$ is zero. This is probably causing unusual behavior for DL.RE or BL.RE MSE estimates. For example, DL.RE MSE estimate for a large state like CA is more than that for a small state DC (similar behavior can be observed for BL.RE). For CA, DL.RE MSE estimate is even higher than the corresponding sampling variance of the direct estimate while all the other MSE estimates are showing opposite results. Overall, our proposed MSE estimates appear reasonable for both years.

6 Monte Carlo simulation

In this section, we report results from a Monte Carlo simulation study. In particular, we evaluate finite sample performances of two different estimators of $A$ — the commonly used REML $\hat{A}_{RE}$ and the proposed estimator $\hat{A}_{MG}$ — in estimating the shrinkage parameters $B_{i}$ , small area means $\theta_{i}$ and MSE of EBLUPs of $\theta_{i}$ . To understand the effect of small $m$ on different estimation problems, we set $m=15$ and generate $\{(y_{i},\theta_{i}),\;i=1,\cdots,m\}$ using the Fay-Herriot model (1).

We use the 1992 SAIPE data described in the previous section to design our simulation study. The 15 areas correspond to states with largest sampling variances $D_{i}$ . In the simulation, we use $x_{i}$ and $D_{i}$ for these states from the 1992 SAIPE data and use $A=15.94$ , which is the median of $D_{i}$ for the 15 states. The weighted least squared estimates of $\beta$ from the real data including all 50 states and DC are treated as true $\beta$ for the simulation.

We define the relative bias (RB) and relative root mean squared error (RRMSE) of an estimator $\hat{B}_{i}$ of $B_{i}$ as:

[TABLE]

where ${\rm MSE}(\hat{B}_{i})={\rm E}(\hat{B}_{i}-B_{i})^{2}.$ The expectations in the definitions of RB and RRMSE are approximated by Monte Carlo $1,000$ independent samples from the Fay-Herriot model. The RB and RRMSE of an estimator $\hat{M}_{i}$ of $M_{i}={\rm MSE}(\hat{\theta}_{i})={\rm E}(\hat{\theta}_{i}-\theta_{i})^{2}$ , where $\hat{\theta}_{i}$ is an estimator of $\theta_{i}$ , are defined similarly. For the parametric bootstrap method, we use $1,000$ bootstrap samples.

Table 3 displays simulated RBs and RRMSEs of two estimators of $B_{i}$ for three selected states: DC, North Dakota (ND), Oklahoma (OK) corresponding to maximum, median and minimum values of $D_{i}$ . These three states correspond to the maximum (0.67), median (0.50) and minimum values (0.46) of $B_{i}$ ’s among the 15 states. The two estimators of $B_{i}$ are simple plug-in estimators – one obtained from REML $\hat{A}_{RE}$ (denoted by RE) and the other from the proposed estimator $\hat{A}_{MG}$ (denoted by HL). For these states, RE consistently overestimates $B_{i}$ while HL underestimates. The absolute values of the RB for HL are always smaller than those of RE. Moreover, variation of RBs for different $B_{i}$ is much lower than that of RE. In terms of RRMSE, HL outperforms RE, especially for small values of $B_{i}$ . Figure 2 displays the RB and RRMSE behavior for RE and HL for all the 15 selected states demonstrating superiority of HL over RE.

Figure 3 displays the simulated MSEs of two EBLUPs of $\theta_{i}$ for each of the 15 states, where two EBLUPs are obtained using the REML $\hat{A}_{RE}$ (RE in the figure) and estimator $\hat{A}_{MG}$ (HL in the figure). There is hardly any difference between the simulated MSEs of the two EBLUPs supporting the theory that these two MSEs are identical up to the order $O(m^{-1}).$

Table 4 reports simulated RBs and RRMSEs of different MSE estimators of EBLUP that uses REML estimator of $A$ . As mentioned earlier, all MSE estimators except naive.RE and PB.RE are second-order unbiased. The naive estimator naive.RE consistently underestimates. All the other MSE estimators improve on naive.RE. The parametric bootstrap estimator PB.RE that uses REML and does not use bias correction continues to underestimate. The second-order unbiased parametric bootstrap MSE estimator PB.BL that uses bias correction also underestimates although the amount of underestimation is generally smaller than that of PB.RE. The proposed second-order unbiased MSE estimators — Taylor.HL and PB.HL — are quite competitive to the second-order unbiased Taylor series MSE estimator, DL.RE, which overestimates for the state with smallest $D_{i}$ . Our single parametric bootstrap second-order unbiased MSE estimator (PB.HL) that does not involve any bias correction is remarkably better than single parametric bootstrap MSE PB.RE (without bias correction) and even second-order unbiased parametric bootstrap MSE estimator PB.BL (with bias correction). All MSE estimators except PB.BL have lower RRMSE than naive.RE. It is interesting to note that the second-order unbiased PB.BL has more RRMSE than naive.RE for all the three states. This is probably due to the poor performance of REML of $A$ that PB.BL uses. The REML of $A$ produces zero estimates $12.4\%$ of the times although true $A$ is 15.94. The performances of DL.RE, Taylor.HL and PB.HL are similar and all are better than PB.RE. The performances of the MSE estimators of EBLUP using the proposed estimator of $A$ is similar to the results of Table 4; see Table 5. The RB and RRMSE behavior of all the MSE estimators for all the 15 states are given in Figure 4.

7 Concluding Remarks

In this paper, we have solved a set of important problems for the well-known Fay-Herriot small area model through a suitably devised adjusted maximum likelihood estimator of the model variance parameter. We have demonstrated the superiority of our methods over the existing methods analytically and through data analysis and Monte Carlo simulations.

Can we extend our results to a general linear mixed model? To answer this question, let us consider the following nested error regression model (NERM) considered by Battese et al. (1988):

[TABLE]

where $\{v_{1}\ldots,v_{m}\}$ and $\{e_{1},\ldots,e_{m}\}$ are independent with $v_{i}{\sim}N(0,\sigma_{v}^{2})$ and $e_{i}{\sim}N(0,\sigma_{e}^{2})$ ; $x_{ij}$ is a $p$ -dimensional vector of known auxiliary variables; $\beta\in R^{p}$ is a $p$ -dimensional vector of unknown regression coefficients; $\psi=(\sigma_{v}^{2},\sigma_{e}^{2})^{\prime}$ is an unknown variance component vector. $n_{i}$ is the number of observed unit level data in $i$ -th area.

The condition for achieving Property 1, we need to solve the following differential equations with shrinkage factor $B_{i}=\sigma_{e}^{2}/(n_{i}\sigma_{v}^{2}+\sigma_{e}^{2})$ , under certain regularity conditions:

[TABLE]

where

[TABLE]

If we use the following adjustment factor for achieving Property 1:

[TABLE]

with some fixed two dimensional vector ${k}$ , the solution of $v$ can be obtained as $v=\frac{H(\psi)}{{k}^{\prime}I_{F}^{-1}\frac{\partial B_{i}(\psi)}{\partial{\psi}}}$ for some $k$ . This solution thus lead to a suitable adjustment factor satisfying

[TABLE]

Thus, there exists multiple solutions for adjustment factor satisfying Property 1 under NERM.

To address such a problem, we will search for the most suitable adjustment factor for the general linear mixed model in the future.

Appendix A Regularity conditions and Lemma 1

R1: $\mbox{rank}(X)=p$ is bounded for large $m$ ;

R2: The elements of $X$ are uniformly bounded, implying $\sup_{j\geq 1}x_{j}(X^{\prime}X)^{-1}x_{j}=O(m^{-1})$ ;

R3: $0<\inf_{j\geq 1}D_{j}\leq\sup_{j\geq 1}D_{j}<\infty$ , $A\in(0,\infty)$ ;

R4: $\log h_{i}(A)$ is free of $y$ and four times continuously differentiable with respect to $A$ . Moreover, $\frac{\partial^{k}\log h_{i}(A)}{\partial A^{k}}$ is of order $O(1)$ , respectively, for large $m$ with $k=0,1,2,3,4$ ;

R5: $|\hat{A}_{i}|<C_{ad}m^{\lambda}$ , where $C_{ad}$ a generic positive constant and $\lambda$ is small positive constant.

In addition to $R4$ , the adjustment factor $h_{+}(A)$ satisfy the following regularity conditions:

R6: $\log h_{+}(A)$ is free of $y$ and four times continuously differentiable with respect to $A$ . Moreover, $\frac{\partial^{k}\log h_{+}(A)}{\partial A^{k}}$ is of order $o(1)$ , for large $m$ with $k=0,1,2,3,4$ ;

R7; $h_{+}(A)$ is a strictly positive on $A>0$ satisfying that $h_{+}(A)\Big{|}_{A=0}=0$ and $h_{+}(A)<C$ on $A>0$ with a generic positive constant $C$ ;

R8: In balanced case, that is, $D_{i}=D$ for all $i$ , $(A+D)^{2}\frac{\partial\log h_{+}(A)}{\partial A}$ is a monotonically decreasing function of $A>0$ with $\lim_{A\rightarrow+0}(A+D)^{2}\frac{\partial\log h_{+}(A)}{\partial A}=\infty$ . When we assume that $\frac{\partial\log h_{+}(A)}{\partial A}>0$ , then, $\lim_{A\rightarrow\infty}(A+D)^{2}\frac{\partial\log h_{+}(A)}{\partial A}=C$ for fixed $m$ , where $C$ is a generic positive constant.

The choice of $h_{+}(A)$ is not unique in general. One can use the choice given in Yoshimori and Lahiri (2014).

We first present the following Lemma that provides properties of $\hat{A}_{i}$ of $A$ . The proof of the theorem is immediate from Theorem 1 of Yoshimori and Lahiri (2014) and Das et al. (2004).

Lemma 1.

Under the regularity conditions $R1-R5$ , we have, for large $m$ ,

(i) $E(\hat{A}_{i}-A)=\frac{\partial\log h_{i}(A)}{\partial A}\frac{2}{{\mbox{tr}[V^{-2}]}}+o(m^{-1});$

(ii) $E(\hat{A}_{i}-A)^{2}=\frac{2}{\mbox{tr}[V^{-2}]}+o(m^{-1});$

(iii) $E[\hat{\theta}_{i}^{EB}(\hat{A}_{i})-\theta_{i}]^{2}\equiv M_{i}[\hat{\theta}_{i}^{EB}(\hat{A}_{i})]=M_{i;approx}({A})+o(m^{-1})$ , where $M_{i;approx}({A})=g_{1i}(A)+g_{2i}(A)+g_{3i}(A)$ with $g_{1i}(A)={AD_{i}}/({A+D_{i}}),$ $g_{2i}(A)={D_{i}^{2}}x_{i}^{\prime}(X^{\prime}V^{-1}X)^{-1}x_{i}/{(A+D_{i})^{2}},$ $g_{3i}(A)={2D_{i}^{2}}/[(A+D_{i})^{3}tr\{V^{-2}\}]$ .

Appendix B Proofs of Theorem 1

B.1 Proof of part (i)

First note that the adjustment factor $h_{i}(A)$ satisfies regularity condition R4. Then part (i) follows from the construction and (2).

B.2 Proof of part (ii)

It suffices to show the strictly positivity for $\hat{A}_{i;MG}$ . Note that $h_{+}(A)h_{i0}(A)L_{RE}(A)\Big{|}_{A=0}=0$ and $h_{+}(A)h_{i0}(A)L_{RE}(A)\geq 0$ for $A\geq 0$ using R6-R7. Thus, we are left to show that

[TABLE]

Let $C$ be a generic constant. Using regularity conditions and $m\geq 1$ , we have

[TABLE]

which imply

[TABLE]

for large $A$ . Thus, $\hat{A}_{i;MG}$ is strictly positive if $m>p+2$ .

B.3 Proof of part (iii)

Using part (iii) of Lemma 1, we get

[TABLE]

Note that using part (i) of Lemma 1 we have: $E[g_{2i}(\hat{A}_{i;MG})]=g_{2i}(A)+o(m^{-1}),\;E[g_{3i}(\hat{A}_{i;MG})]=g_{3i}(A)+o(m^{-1})$ . Since $g_{1i}(A)=(1-B_{i})D_{i}$ , we have $E[g_{1i}(\hat{A}_{i;MG})]=g_{1i}(A)+o(m^{-1})$ , using part (i). This proves part (iii).

B.4 Proof of part (iv)

Using part (iii), we have

[TABLE]

where $E[|R|]=o(m^{-1})$ . The result now follows from part (iii).

Appendix C Proof of the uniqueness of $\hat{A}_{MG}$ in balanced case

In the balanced case, we have

[TABLE]

Thus, $(A+D)^{2}\frac{\partial\log L(A)}{\partial A}$ is a linear function of $A$ . Therefore, our estimate of $A$ is obtained as a solution of:

[TABLE]

Define $K(A)$ as the left hand of (18). For $A>0$ , using the regularity condition R6-R8 and $m>p+2$ , we show that $\lim_{A\rightarrow+0}K(A)=\infty$ , $\lim_{A\rightarrow\infty}K(A)=-\infty$ and $K(A)$ is a strictly monotonically decreasing function of $A$ on $A>0$ . Hence, there exist $A_{+}$ and $A_{-}$ such that $K(A_{+})=-\varepsilon$ and $K(A_{-})=\varepsilon$ with small $\varepsilon>0$ and $0<A_{-}<A_{+}<\infty$ . Thus, using the intermediate value theorem, we conclude that the adjustment term $h_{+}(A)$ leads to a unique estimate of $A$ on $A>0$ .

Acknowledgement

The first author’s research was supported by Grant-in-Aid for Research Activity start-up, JSPS Grant Number 26880011. The second author’s research was supported in part by the National Science Foundation Grant Number SES-1534413.

Bibliography57

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2] Bell, W. R., Basel, W., Cruse, C., Dalzell, L., Maples, J., O Hara, B., and Powers, D. (2007), “Use of ACS Data to Produce SAIPE Model-Based Estimates of Poverty for Counties,” Census Report .
3[3]
4[4] Butar, B.F., (1997). “Empirical Bayes methods in survey sampling,” Ph.D. Thesis, Department of Mathematics and Statistics, University of Nebraska-Lincoln, unpublished.
5[5]
6[6] Butar, B.F., and Lahiri, P. (2003), “On measures of uncertainty of empirical Bayes small area estimators,” Special issue II: Model Selection, Model Diagnostics, Empirical Bayes and Hierarchical Bayes, Journal of Statistical Planning and Inference , 112, 63-76.
7[7]
8[8] Casas-Cordero, C., Encina, J. and Lahiri, P. (2015), “ Poverty Mapping for the Chilean Comunas,” In Analysis of Poverty Data by Small Area Estimation, ed. Monica Pratesi, Wiley Series in Survey Methodology.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A New Model Variance Estimator for an Area Level Small Area Model to Solve Multiple Problems Simultaneously

Abstract

1 Introduction

2 A New Adjusted Maximum Likelihood Estimator of AAA

Theorem 1**.**

3 The balanced case: Di=D, i=1,⋯ ,mD_{i}=D,\;i=1,\cdots,mDi​=D,i=1,⋯,m

4 A Connection to the Bayesian Approach

5 SAIPE data analysis

6 Monte Carlo simulation

7 Concluding Remarks

Appendix A Regularity conditions and Lemma 1

Lemma 1**.**

Appendix B Proofs of Theorem 1

B.1 Proof of part (i)

B.2 Proof of part (ii)

B.3 Proof of part (iii)

B.4 Proof of part (iv)

Appendix C Proof of the uniqueness of A^MG\hat{A}_{MG}A^MG​ in balanced case

Acknowledgement

2 A New Adjusted Maximum Likelihood Estimator of $A$

Theorem 1.

3 The balanced case: $D_{i}=D,\;i=1,\cdots,m$

Lemma 1.

Appendix C Proof of the uniqueness of $\hat{A}_{MG}$ in balanced case