Weighted empirical likelihood for quantile regression with nonignorable   missing covariates

Xiaohui Yuan; Xiaogang Dong

arXiv:1703.01866·stat.ME·October 10, 2017

Weighted empirical likelihood for quantile regression with nonignorable missing covariates

Xiaohui Yuan, Xiaogang Dong

PDF

Open Access

TL;DR

This paper introduces a weighted empirical likelihood estimator for quantile regression that effectively handles nonignorable missing covariates, achieving efficiency gains over traditional methods.

Contribution

It proposes a novel, computationally simple estimator that attains semiparametric efficiency under correct missingness probability specification.

Findings

01

The estimator is computationally simple.

02

It achieves semiparametric efficiency.

03

Simulation and real data demonstrate improved performance.

Abstract

In this paper, we propose an empirical likelihood-based weighted estimator of regression parameter in quantile regression model with nonignorable missing covariates. The proposed estimator is computationally simple and achieves semiparametric efficiency if the probability of missingness on the fully observed variables is correctly specified. The efficiency gain of the proposed estimator over the complete-case-analysis estimator is quantified theoretically and illustrated via simulation and a real data application.

Figures2

Click any figure to enlarge with its caption.

Tables1

$τ$	n	Estimator	$β_{0}$	$β_{1}$	$β_{2}$
0.3	100	${\hat{β}}_{i d e a l}$	-0.0022 (0.1185)	0.0084 (0.1095)	-0.0062 (0.1272)
		${\hat{β}}_{C}$	0.0072 (0.2403)	0.0032 (0.1851)	-0.0030 (0.1853)
		${\hat{β}}_{I P W M A R}$	-0.1714(0.3065)	0.0130 (0.2128)	-0.0042 (0.2077)
		${\hat{β}}_{E L W M A R}$	-0.1808 (0.3021)	0.0231 (0.2144)	-0.0120 (0.1945)
		${\hat{β}}_{E L W}$	-0.0004 (0.2446)	0.0107 (0.1875)	-0.0098 (0.1752)
	300	${\hat{β}}_{i d e a l}$	0.0016 (0.0685)	0.0001 (0.0617)	-0.0014 (0.0676)
		${\hat{β}}_{C}$	0.0008 (0.1332)	0.0032 (0.1031)	-0.0011 (0.1016)
		${\hat{β}}_{I P W M A R}$	-0.1669 (0.2188)	0.0133 (0.1167)	-0.0068 (0.1150)
		${\hat{β}}_{E L W M A R}$	-0.1672 (0.2079)	0.0155 (0.1116)	-0.0097(0.0968)
		${\hat{β}}_{E L W}$	-0.0007 (0.1252)	0.0053 (0.1002)	-0.0032 (0.0873)
0.5	100	${\hat{β}}_{i d e a l}$	-0.0021 (0.1128)	0.0018 (0.0997)	0.0038 (0.1187)
		${\hat{β}}_{C}$	0.0016 (0.2347)	0.0014 (0.1781)	0.0073 (0.1765)
		${\hat{β}}_{I P W M A R}$	-0.1636 (0.2761)	0.0148 (0.1851)	-0.0020 (0.1850)
		${\hat{β}}_{E L W M A R}$	-0.1693 (0.2794)	0.0209 (0.1814)	-0.0076 (0.1649)
		${\hat{β}}_{E L W}$	-0.0023 (0.2326)	0.0042 (0.1685)	0.0007 (0.1617)
	300	${\hat{β}}_{i d e a l}$	-0.0015 (0.0648)	0.0002 (0.0608)	0.0011 (0.0679)
		${\hat{β}}_{C}$	-0.0001 (0.1274)	0.0008 (0.0979)	0.0017 (0.0973)
		${\hat{β}}_{I P W M A R}$	-0.1646 (0.2040)	0.0136 (0.1036)	-0.0073 (0.1003)
		${\hat{β}}_{E L W M A R}$	-0.1650 (0.2007)	0.0158 (0.1002)	-0.0087 (0.0891)
		${\hat{β}}_{E L W}$	-0.0017 (0.1238)	0.0032 (0.0954)	-0.0001 (0.0889)
0.7	100	${\hat{β}}_{i d e a l}$	-0.0090 (0.1216)	0.0025 (0.1147)	0.0021 (0.1212)
		${\hat{β}}_{C}$	-0.0232 (0.2471)	0.0118 (0.1864)	-0.0042 (0.1791)
		${\hat{β}}_{I P W M A R}$	-0.1750 (0.2839)	0.0253 (0.1845)	-0.0089 (0.1790)
		${\hat{β}}_{E L W M A R}$	-0.1772 (0.2901)	0.0263 (0.1868)	-0.0105 (0.1694)
		${\hat{β}}_{E L W}$	-0.0203 (0.2498)	0.0076 (0.1870)	0.0002 (0.1795)
	300	${\hat{β}}_{i d e a l}$	-0.0014 (0.0706)	0.0019 (0.0630)	0.0001 (0.0708)
		${\hat{β}}_{C}$	0.0018 (0.1371)	-0.0032 (0.1008)	0.0019 (0.1028)
		${\hat{β}}_{I P W M A R}$	-0.1576 (0.2007)	0.0136 (0.1046)	-0.0062 (0.1033)
		${\hat{β}}_{E L W M A R}$	-0.1557 (0.1983)	0.0130 (0.1006)	-0.0059 (0.0935)
		${\hat{β}}_{E L W}$	0.0025 (0.1325)	-0.0003 (0.0969)	0.0009 (0.0964)

Equations108

Q_{τ} (Y ∣ X, Z, β^{*}) = β_{0}^{*} + X^{T} β_{1}^{*} + Z^{T} β_{2}^{*} = W^{T} β^{*},

Q_{τ} (Y ∣ X, Z, β^{*}) = β_{0}^{*} + X^{T} β_{1}^{*} + Z^{T} β_{2}^{*} = W^{T} β^{*},

(Y_{i}, X_{i}^{T}, Z_{i}^{T}, δ_{i}), i = 1, \dots, n,

(Y_{i}, X_{i}^{T}, Z_{i}^{T}, δ_{i}), i = 1, \dots, n,

\hat{β}_{C} = ar g β \in Θ min \frac{1}{n} i = 1 \sum n δ_{i} ρ_{τ} (Y_{i} - W_{i}^{T} β),

\hat{β}_{C} = ar g β \in Θ min \frac{1}{n} i = 1 \sum n δ_{i} ρ_{τ} (Y_{i} - W_{i}^{T} β),

Y ⊥ δ ∣ X, Z .

Y ⊥ δ ∣ X, Z .

P (δ = 1∣ Y, Z) = π (Y, Z, γ^{*}),

P (δ = 1∣ Y, Z) = π (Y, Z, γ^{*}),

L_{B} (γ) = i = 1 \sum n [δ_{i} lo g {π (Y_{i}, Z_{i}, γ)} + (1 - δ_{i}) lo g {1 - π (Y_{i}, Z_{i}, γ)}] .

L_{B} (γ) = i = 1 \sum n [δ_{i} lo g {π (Y_{i}, Z_{i}, γ)} + (1 - δ_{i}) lo g {1 - π (Y_{i}, Z_{i}, γ)}] .

U_{B} (δ_{i}, Z_{i}, Y_{i}, γ) = \frac{δ _{i} - π ( Y _{i} , Z _{i} , γ )}{π ( Y _{i} , Z _{i} , γ ) { 1 - π ( Y _{i} , Z _{i} , γ )}} \frac{\partial π ( Y _{i} , Z _{i} , γ )}{\partial γ},

U_{B} (δ_{i}, Z_{i}, Y_{i}, γ) = \frac{δ _{i} - π ( Y _{i} , Z _{i} , γ )}{π ( Y _{i} , Z _{i} , γ ) { 1 - π ( Y _{i} , Z _{i} , γ )}} \frac{\partial π ( Y _{i} , Z _{i} , γ )}{\partial γ},

g_{1} (δ_{i}, X_{i}, Z_{i}, Y_{i}, θ) = [δ_{i} - π (Y_{i}, Z_{i}, γ)] m (Z_{i}, Y_{i}, β, α),

\displaystyle g(\delta_{i},X_{i},Z_{i},Y_{i},\theta)=\left(\begin{array}[]{c}g_{1}(\delta_{i},X_{i},Z_{i},Y_{i},\theta)\\ U_{B}(\delta_{i},Z_{i},Y_{i},\gamma)\end{array}\right).

p_{i} \geq 0, i = 1 \sum n p_{i} = 1, i = 1 \sum n p_{i} g (δ_{i}, X_{i}, Z_{i}, Y_{i}, \hat{θ}) = 0.

p_{i} \geq 0, i = 1 \sum n p_{i} = 1, i = 1 \sum n p_{i} g (δ_{i}, X_{i}, Z_{i}, Y_{i}, \hat{θ}) = 0.

\overset{p}{^}_{i} = \frac{1}{n} \frac{1}{1 + λ ^ ^{T} g ( δ _{i} , X _{i} , Z _{i} , Y _{i} , θ ^ )},

\overset{p}{^}_{i} = \frac{1}{n} \frac{1}{1 + λ ^ ^{T} g ( δ _{i} , X _{i} , Z _{i} , Y _{i} , θ ^ )},

\frac{1}{n} i = 1 \sum n \frac{g ( δ _{i} , X _{i} , Z _{i} , Y _{i} , θ ^ )}{1 + λ ^ ^{T} g ( δ _{i} , X _{i} , Z _{i} , Y _{i} , θ ^ )} = 0.

\frac{1}{n} i = 1 \sum n \frac{g ( δ _{i} , X _{i} , Z _{i} , Y _{i} , θ ^ )}{1 + λ ^ ^{T} g ( δ _{i} , X _{i} , Z _{i} , Y _{i} , θ ^ )} = 0.

\hat{β}_{E L W} = ar g β \in Θ min i = 1 \sum n \overset{p}{^}_{i} δ_{i} ρ_{τ} (Y_{i} - W_{i}^{T} β) .

\hat{β}_{E L W} = ar g β \in Θ min i = 1 \sum n \overset{p}{^}_{i} δ_{i} ρ_{τ} (Y_{i} - W_{i}^{T} β) .

λ (θ) = ar g λ max i = 1 \sum n lo g {1 + λ^{T} g (δ_{i}, X_{i}, Z_{i}, Y_{i}, θ)} .

λ (θ) = ar g λ max i = 1 \sum n lo g {1 + λ^{T} g (δ_{i}, X_{i}, Z_{i}, Y_{i}, θ)} .

F_{β}

F_{β}

S_{ϕ}

D_{1}

D_{2}

D_{3}

D_{4}

S_{B}

Σ_{E L W}

Σ_{E L W}

(X, Z, Y)^{T} ∣ δ \sim N ((δ, 0, η δ)^{T}, Ψ),

(X, Z, Y)^{T} ∣ δ \sim N ((δ, 0, η δ)^{T}, Ψ),

P (δ = 1∣ Z, Y) = \frac{exp ( γ _{0} + γ _{1} Z + γ _{2} Y )}{1 + exp ( γ _{0} + γ _{1} Z + γ _{2} Y )}

P (δ = 1∣ Z, Y) = \frac{exp ( γ _{0} + γ _{1} Z + γ _{2} Y )}{1 + exp ( γ _{0} + γ _{1} Z + γ _{2} Y )}

Q_{τ} (Y ∣ X, Z) = β_{0} + β_{1} X + β_{2} Z,

Q_{τ} (Y ∣ X, Z) = β_{0} + β_{1} X + β_{2} Z,

Q_{τ} (Y_{i} ∣ X_{i}, Z_{i}, β) = β_{0} + X_{i} β_{1} + Z_{i}^{T} β_{2}, i = 1, \dots, n,

Q_{τ} (Y_{i} ∣ X_{i}, Z_{i}, β) = β_{0} + X_{i} β_{1} + Z_{i}^{T} β_{2}, i = 1, \dots, n,

\hat{λ} = λ (\hat{θ}) = n^{- 1} S_{g}^{- 1} [U_{g} (θ^{*}) + G_{γ} S_{B}^{- 1} U (γ^{*})] + o_{p} (n^{- 1/2}),

\hat{λ} = λ (\hat{θ}) = n^{- 1} S_{g}^{- 1} [U_{g} (θ^{*}) + G_{γ} S_{B}^{- 1} U (γ^{*})] + o_{p} (n^{- 1/2}),

\hat{λ}

\hat{λ}

n^{- 1} U_{g} (\hat{θ})

n^{- 1} U_{g} (\hat{θ})

\displaystyle\frac{1}{n}\sum_{i=1}^{n}g(\delta_{i},X_{i},Z_{i},Y_{i},\hat{\theta})g^{T}(\delta_{i},X_{i},Z_{i},Y_{i},\hat{\theta})\stackrel{{\scriptstyle\textstyle p}}{{\longrightarrow}}\left(\begin{array}[]{cc}D_{1}&D_{2}\\ D_{2}^{T}&S_{B}\end{array}\right)=S_{g},

\displaystyle\frac{1}{n}\sum_{i=1}^{n}g(\delta_{i},X_{i},Z_{i},Y_{i},\hat{\theta})g^{T}(\delta_{i},X_{i},Z_{i},Y_{i},\hat{\theta})\stackrel{{\scriptstyle\textstyle p}}{{\longrightarrow}}\left(\begin{array}[]{cc}D_{1}&D_{2}\\ D_{2}^{T}&S_{B}\end{array}\right)=S_{g},

\displaystyle n^{-1}\frac{\partial U_{g}(\tilde{\theta})}{\partial\gamma^{T}}\stackrel{{\scriptstyle\textstyle p}}{{\longrightarrow}}E\left\{\frac{\partial g(\delta_{i},X_{i},Z_{i},Y_{i},\theta^{*})}{\partial\gamma^{T}}\right\}=\left(\begin{array}[]{c}-D_{2}\\ -S_{B}\end{array}\right)=G_{\gamma},

n^{- 1} \frac{\partial U _{g} ( θ ~ )}{\partial α ^{T}} ⟶ p 0, n^{- 1} \frac{\partial U _{g} ( θ ~ )}{\partial β ^{T}} ⟶ p 0.

\overset{γ}{^} - γ^{*} = n^{- 1} S_{B}^{- 1} U (γ^{*}) + o_{p} (n^{- 1/2}),

\overset{γ}{^} - γ^{*} = n^{- 1} S_{B}^{- 1} U (γ^{*}) + o_{p} (n^{- 1/2}),

\hat{λ}

\hat{λ}

A_{i} (η) = ρ_{τ} (ε_{i} - W_{i}^{T} η / n) - ρ_{τ} (ε_{i}),

A_{i} (η) = ρ_{τ} (ε_{i} - W_{i}^{T} η / n) - ρ_{τ} (ε_{i}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Statistical Distribution Estimation and Applications

Full text

Weighted empirical likelihood for quantile regression with nonignorable missing covariates

Xiaohui Yuan

[email protected]

Xiaogang Dong

[email protected]

School of Basic Science, Changchun University of Technology, Changchun 130012, China

Abstract

In this paper, we propose an empirical likelihood-based weighted estimator of regression parameter in quantile regression model with nonignorable missing covariates. The proposed estimator is computationally simple and achieves semiparametric efficiency if the probability of missingness on the fully observed variables is correctly specified. The efficiency gain of the proposed estimator over the complete-case-analysis estimator is quantified theoretically and illustrated via simulation and a real data application.

keywords:

Complete-case-analysis estimator, Empirical likelihood, Nonignorable missing covariates, Quantile regression

††journal:

1 Introduction

Quantile regression, as introduced by Koenker and Bassett (1978), is robust against outliers and can describe the entire conditional distribution of the response variable given the covariates. Due to these advantages, quantile regression became appealing in econometrics, statistics, and biostatistics. The book by Koenker (2005) contains a comprehensive account of overview and discussions in quantile regression.

Let $Y$ denote the outcome variable, $Z$ be a vector of covariates which is always observed, and $X$ be a vector of covariates which may not be observed for all subjects. The quantile regression model assumes that the $\tau$ -th conditional quantile of $Y$ given $X$ and $Z$ :

[TABLE]

where $W=(1,X^{T},Z^{T})^{T}$ and $\beta^{*}=(\beta_{0}^{*},\beta_{1}^{*T},\beta_{2}^{*T})^{T}$ is interior to parameter space $\Theta$ , $\Theta$ is a compact subset of $R^{p}$ . We are interested in the inference about $\beta^{*}$ based on a random sample of incomplete data

[TABLE]

where all the $Z_{i}$ ’s and $Y_{i}$ ’s are observed, and $\delta_{i}=0$ if $X_{i}$ is missing, otherwise $\delta_{i}=1$ .

The most commonly used method for handling missing covariate data is the complete-case analysis (CCA), with only the remaining complete data used to perform a regression-based or likelihood-based analysis. The CCA esitmator of $\beta^{*}$ is given by

[TABLE]

where $\rho_{\tau}(u)=u\{\tau-I(u<0)\}$ is the quantile loss function and $I(\cdot)$ is the indicator function.

In statistic literature, there are three missing data categories (Little and Rubin, 2002). The first case is missing completely at random (MCAR), i.e., data missing mechanism is independent of any observable or unobservable quantities. The second case is missing at random (MAR), i.e., data missing mechanism depends on the observed variables. The third case is not missing at random (NMAR) or nonignorable, i.e., data missing mechanism depends on their own values.

When $X_{i}$ ’s are not MCAR, the CCA estimator can be biased. Consistent and efficient estimators have been proposed in the statistical literature for the quantile regression model when the covariates data are MAR. See for example, Wei et al. (2012) developed an iterative imputation procedure for estimating the conditional quantile in the presence of missing covariates. Sherwood et al. (2013) proposed an inverse probability weighted (IPW) approach to correct for the bias from longitudinal dropouts. Chen et al. (2015) examined the problem of estimation in a quantile regression model and developed three nonparametric methods when observations are missing at random under independent and nonidentically distributed errors. Liu and Yuan (2016) proposed a weighted quantile regression model with weights chosen by empirical likelihood. This approach efficiently incorporates the incomplete data into the data analysis by combining the complete data unbiased estimating equations and incomplete data unbiased estimating equations. However, it may not be an easy task to extend these methods to deal with NMAR missing data mechanisms, because these methods are biased under the NMAR assumption.

NMAR is the most difficult problem in the missing data literature. Following Little and Zhang (2011) and Bartlett et al. (2014), we make the following “not missing at random” (NMAR) assumption:

[TABLE]

The NMAR assumption (3) implies that, missingness in a covariate depends on the value of that covariate, but is conditionally independent of outcome. The CCA estimator is valid but inefficient under the assumption (3) because it fails to draw on the observed information contained in the incomplete cases.

In the context of mean regression model, Bartlett et al. (2014) proposed an augmented CCA estimator to improve upon the efficiency of CCA estimator by modeling an additional model for the probability of missingness on the fully observed variables, i.e. $P(\delta=1|Y,Z)$ . The estimating function used in Bartlett et al. (2014) utilizes all the observed data by drawing on the information available from both complete and incomplete cases and thus improves upon the efficiency of CCA estimator. Note that under NMAR assumption (3), $P(\delta=1|Y,X,Z)=P(\delta=1|X,Z)$ , whose feasible estimators are not available, since the observations of $X$ are missing on some subjects. Thanks to the NMAR assumption (3), there is no need to estimate $P(\delta=1|X,Z)$ under the assumption (3). Recently, Xie and Zhang (2017) proposed an empirical likelihood approach for estimating the regression parameters in mean regression model with missing covariates under NMAR assumption (3). They showed that the empirical likelihood estimator can improve estimation efficiency if $P(\delta=1|Y,Z)$ is correctly specified.

In this paper, we put forward an empirical likelihood-based weighted (ELW) estimator for estimating quantile regression model with nonignorable missing covariates under NMAR assumption (3). To fully utilize the information contained in the incomplete data, we incorporate the unbiased estimating equations of incomplete observations into empirical likelihood and obtain the empirical likelihood-based weights to adjust the CCA estimator defined in (2). The proposed ELW estimator is computationally simple as the CCA estimator and achieves semiparametric efficiency if $P(\delta=1|Y,Z)$ is correctly specified.

Empirical likelihood is an effective approach to improving efficiency. For a comprehensive review of the empirical likelihood method, one can refer to Qin and Lawless (1994), Owen (2001), Lopez et al. (2009) among others. For applications of empirical likelihood in missing-data problems, one can refer to Wang and Rao (2002), Qin et al. (2009), Liu and Yuan (2012), Liu et al. (2013), Zhong and Qin (2017) among others.

The rest of this paper is organized as follows. In section 2, we introduce the empirical likelihood-based weighted estimator for quantile regression model. In section 3, we show that the ELW estimator is asymptotically equivalent to the profile empirical likelihood estimator and thus achieves semiparametric efficiency. Numerical studies are reported in sections 4-5. Proofs of the main theorems needed are given in the Appendix.

2 The empirical likelihood-based weighted estimation

In this section, we propose the ELW estimator of $\beta^{*}$ under the assumption (3). Under the assumption (3), we only need to estimate the probability of $X$ being observed given $Y$ and $Z$ , i.e. $P(\delta=1|Y,Z)$ . Following Bartlett et al. (2014) and Xie and Zhang (2017), we assume that $P(\delta=1|Y,Z)$ is described by the probability model:

[TABLE]

where $\gamma^{*}$ is a $q\times 1$ unknown vector parameter. It is natural to estimate $\gamma^{*}$ by the binomial likelihood estimator $\hat{\gamma}$ which maximizes the binomial log-likelihood

[TABLE]

Let $m(Y_{i},Z_{i},\beta,\alpha)$ be a working model of $E\{\delta_{i}\phi(X_{i},Z_{i},Y_{i},\beta)|Z_{i},Y_{i}\}$ with $\phi(X_{i},Z_{i},Y_{i},\beta)=W_{i}\{I(Y_{i}-W_{i}^{T}\beta<0)-\tau\}$ . In the following, we proposed the ELW estimator of $\beta^{*}$ . Define

[TABLE]

Let $p_{i}$ represent the probability weight allocated to $g(\delta_{i},X_{i},Z_{i},Y_{i},\hat{\theta})$ , where $\hat{\theta}=(\hat{\alpha}^{T},\hat{\beta}_{C}^{T},\hat{\gamma}^{T})^{T}$ and $\hat{\alpha}$ is a consistent estimator for some $\alpha^{*}$ . If $\pi(y,z,\gamma)$ is correctly specified, one can show that $E\{g(\delta_{i},X_{i},Z_{i},Y_{i},\theta^{*})\}=0$ , where $\theta^{*}=(\alpha^{*T},\beta^{*T},\hat{\gamma}^{*T})^{T}$ . Then, we maximize the empirical likelihood function $\prod_{i=1}^{n}p_{i}$ subject to the constraints:

[TABLE]

By using the Lagrange multiplier method, we get

[TABLE]

where $\hat{\lambda}$ is the Lagrange multiplier that satisfies

[TABLE]

The ELW estimator of $\beta^{*}$ is given by

[TABLE]

Define

[TABLE]

From (2), it is easily seen $\hat{\lambda}=\lambda(\hat{\theta})$ . For fixed $\theta=\hat{\theta}$ , solving (7) is a well-behaved optimization problem since the objective function is globally concave and can be solved by a simple Newton-Raphson numerical procedure.

Let $F_{i}(\cdot)$ and $f_{i}(\cdot)$ denote respectively the conditional distribution and density functions of $Y_{i}$ given $(X_{i},Z_{i})$ . Denote

[TABLE]

The following regularity conditions help us in doing asymptotic analysis:

C1

The $\tau$ -th conditional quantile of $Y_{i}$ given $W_{i}$ is $Q_{\tau}(Y_{i}|W_{i},\beta^{*})=W_{i}^{T}\beta^{*}$ and $W_{i}$ has a bounded support. 2. C2

$Y\perp\delta|X,Z$ . 3. C3

$F_{\beta}$ , $S_{\phi}$ , $S_{B}$ are positive definite. 4. C4

$F_{i}(\cdot)$ is absolutely continuous and $f_{i}(\cdot)$ is uniformly bounded away from 0 and $\infty$ at 0. 5. C5

(a) $P(\delta=1|Y,Z)=\pi(Y,Z,\gamma^{*})$ (b) $\inf_{(Y,Z)}\pi(Y,Z,\gamma^{*})\geq c_{0}$ for some $c_{0}>0.$ (c) For all $(Y_{i},Z_{i})$ , $\pi(Y_{i},Z_{i},\gamma)$ admits all third partial derivatives $\frac{\partial^{3}\pi(Y_{i},Z_{i},\gamma)}{\partial\gamma_{k}\partial\gamma_{l}\partial\gamma_{m}}$ for all $\gamma$ in a neighborhood of the true value $\gamma^{*}$ , $\biggr{\|}\frac{\partial^{3}\pi(Y_{i},Z_{i},\gamma)}{\partial\gamma_{k}\partial\gamma_{l}\partial\gamma_{m}}\biggr{\|}$ and $\|\partial\pi(Y_{i},Z_{i},\gamma)/\partial\gamma\|^{2}$ are bounded by an integrable function for all $\gamma$ in this neighborhood. 6. C6

For all $(Y_{i},Z_{i})$ , $m(Y_{i},Z_{i},\beta,\alpha)$ admits all second partial derivatives $\frac{\partial^{2}m(Y_{i},Z_{i},\beta,\alpha)}{\partial\beta_{i}\partial\beta_{j}}$ and $\frac{\partial^{2}m(Y_{i},Z_{i},\beta,\alpha)}{\partial\alpha_{i}\partial\alpha_{j}}$ for all $\beta$ and $\alpha$ in a neighborhood of $(\beta^{*T},\alpha^{*T})^{T}$ . $\|m(Y_{i},Z_{i},\beta,\alpha)\|^{2}$ , $\|\frac{\partial^{2}m(Y_{i},Z_{i},\beta,\alpha)}{\partial\beta_{i}\partial\beta_{j}}\|$ and $\|\frac{\partial^{2}m(Y_{i},Z_{i},\beta,\alpha)}{\partial\alpha_{i}\partial\alpha_{j}}\|$ are bounded by an integrable function for all $\beta$ and $\alpha$ in this neighborhood.

The asymptotic distribution of $\hat{\beta}_{C}$ is given by the following theorem.

Theorem 2.1

Under conditions C1-C4, $n^{1/2}(\hat{\beta}_{C}-\beta^{*})\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,\Sigma_{C})$ as $n\rightarrow\infty$ , where $\Sigma_{C}=F_{\beta}^{-1}S_{\phi}F_{\beta}^{-1}.$

The asymptotic distribution of $\hat{\beta}_{ELW}$ is given by the following theorem.

Theorem 2.2

Under conditions C1-C6, $n^{1/2}(\hat{\beta}_{ELW}-\beta^{*})\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,\Sigma_{ELW})$ as $n\rightarrow\infty$ , where

[TABLE]

$V_{1}=D_{3}-D_{4}S_{B}^{-1}D_{2}^{T}$ * and $V_{2}=D_{1}-D_{2}S_{B}^{-1}D_{2}^{T}.$ *

For two matrices $A$ and $B$ , we write $A\leq B$ if $B-A$ is a nonnegative-definite matrix.

Corollary 2.3

If both $F_{\beta}$ and $V_{2}$ are positive definite, we have $\Sigma_{ELW}\leq\Sigma_{C}$ , and the equality holds if and only if $V_{1}=0$ .

Corollary 2.3 reveals that $\hat{\beta}_{ELW}$ is at least as efficient as $\hat{\beta}_{C}$ for any working regression function $m(Y_{i},Z_{i},\beta,\alpha)$ , whether or not it correctly identifies the optimal regression function $E\{\phi(X_{i},Z_{i},Y_{i},\beta)|Z_{i},Y_{i},\delta_{i}=1\}$ .

Although $\hat{\beta}_{ELW}$ can be obtained easily, it is difficult to estimate the limiting covariance matrix analytically. We apply the resampling method in Liu and Yuan (2016) to the inference about $\beta^{*}$ .

3 Simulation studies

In this section, we investigate the performance of the proposed estimator $\hat{\beta}_{ELW}$ and several other estimators based on Monte-Carlo simulations.

The simulated data are generated by the procedure of Bartlett et al. (2014), in which the non-missing indicator $\delta$ is distributed with $P(\delta=1)=0.5$ , and $(X,Z,Y)$ is generated from a trivariate normal distribution conditional on $\delta$ :

[TABLE]

where $\Psi=(\sigma_{ab})$ , $a,b=x,z,y$ , $\eta=(\sigma_{xy}\sigma_{zz}-\sigma_{xz}\sigma_{zy})\upsilon_{1}$ and $\upsilon_{1}=(\sigma_{xx}\sigma_{zz}-\sigma_{xz}^{2})^{-1}$ .

It is easy to verify that the assumption $\delta\bot Y|(X,Z)$ is satisfied in this setup. Conditional on $Z$ and $Y$ , the probability of $P(\delta=1|Z,Y)$ is a logistic regression with

[TABLE]

where $\gamma_{0}=-{0.5\eta^{2}\sigma_{zz}}\upsilon_{2},\gamma_{1}=-{\eta\sigma_{zy}}\upsilon_{2},\gamma_{2}={\eta\sigma_{zz}}\upsilon_{2}$ and $\upsilon_{2}=(\sigma_{zz}\sigma_{yy}-\sigma_{zy}^{2})^{-1}$ . The conditional quantile model of interest is specified as

[TABLE]

with $\beta_{0}=\Phi^{-1}(\tau,\sigma^{2}),$ $\beta_{1}=(\sigma_{xy}\sigma_{zz}-\sigma_{xz}\sigma_{zy})\upsilon_{1}$ , $\beta_{2}=(\sigma_{zy}\sigma_{xx}-\sigma_{xz}\sigma_{xy})\upsilon_{1}$ , $\sigma^{2}=\sigma_{yy}-(\sigma_{xz}^{2}\sigma_{zz}-2\sigma_{xz}^{2}\sigma_{zy}+\sigma_{zy}^{2}\sigma_{xx})\upsilon_{1}$ .

We set $\sigma_{xx}=\sigma_{yy}=\sigma_{zz}=1$ , $\sigma_{xz}=\sigma_{xy}=\sigma_{zy}=0.5$ and generate 1000 Monte Carlo data sets of sample sizes $n=100$ and $300$ . Five estimators are considered:

$\hat{\beta}_{ideal}$ : the quantile regression estimator with the full observations. This is the ideal case, but it is not feasible in practice. Nevertheless, we used it as a benchmark for comparison; 2. 2.

$\hat{\beta}_{C}$ : the CCA estimator defined in equation (2); 3. 3.

$\hat{\beta}_{IPWMAR}$ : the IPW estimator assuming MAR, introduced in Sherwood et al. (2013); 4. 4.

$\hat{\beta}_{ELWMAR}$ : the ELW estimator assuming MAR, proposed by Liu and Yuan (2016); 5. 5.

$\hat{\beta}_{ELW}$ : the ELW estimator defined in equation (6).

The empirical bias and the root-mean-squared errors (RMSEs) of the proposed estimators with sample sizes of 100 and 300 are reported in Table 1. The results can be summarized as follows: the CCA estimator $\hat{\beta}_{C}$ and the ELW estimator $\hat{\beta}_{ELW}$ are unbiased as expected. While $\hat{\beta}_{IPWMAR}$ and $\hat{\beta}_{ELWMAR}$ for $\beta_{0}$ are clearly biased. $\hat{\beta}_{ELW}$ performs better than $\hat{\beta}_{C}$ in terms of RMSE in most cases, which agrees with our theory. $\hat{\beta}_{C}$ and $\hat{\beta}_{ELW}$ are improved in terms of RMSE as the sample size $n$ goes up from 100 to 300.

4 Data analysis

In this section, we apply the proposed method to the data on alcohol consumption, age, body mass index and systolic blood pressure from the 2013-2014 NHANES. We model the population quantile of SBP (systolic blood pressure) as a function of the following four covariates: BMI (body mass index), Alcohol (log{alcohol consumption per day $+1$ }), Age ( $\{\mbox{age}-50\}/10$ ) and Age2 ( $\{\mbox{age}-50\}^{2}/100$ ).

In our analysis, there are 7104 observations in the data set, where the dependent variable SBP and the covariates BMI and Age have complete data, the covariate Alcohol are missing 53.29%. It is a priori plausible that missingness in Alcohol is primarily dependent on the value of itself (i.e. MNAR), and that missingness in Alcohol is independent of SBP conditional on Alcohol, BMI, Age, and Age2. Consequently, CCA is expected to give valid inferences, while the MAR assumption likely does not hold.

For $i=1,\cdots,n=7104$ , let $Y_{i}$ denote the $i$ th observation of $Y=$ SBP, $Z_{i}$ denote the $i$ th observation of $Z$ =(BMI, Age, Age2)T and $X_{i}$ denote the $i$ th observation of $X=$ Alcohol. Then, we consider the following model for the $\tau$ th conditional quantile of $Y_{i}$ given $W_{i}=(1,X_{i},Z_{i}^{T})^{T}$ :

[TABLE]

where $\beta=(\beta_{0},\beta_{1},\beta_{2}^{T})^{T}$ and $\beta_{2}=(\beta_{21},\beta_{22},\beta_{23})^{T}$ . We consider two estimators $\hat{\beta}_{C}$ and $\hat{\beta}_{ELW}$ . For the ELW method, the probability of whether the Alcohol is observed is modeled by $\pi(Y,Z,\gamma)=\{1+\exp(-\gamma_{0}-Y\gamma_{1}-Z^{T}\gamma_{2})\}^{-1}$ .

In Figure 1, we plot the estimated regression coefficients, $\hat{\beta}_{C}$ and $\hat{\beta}_{ELW}$ for $\beta_{1}$ , $\beta_{21}$ , $\beta_{22}$ and $\beta_{23}$ , at quantile levels $\tau=0.1,0.2,\cdots,0.9$ . We see that the CCA and ELW methods produce similar estimated regression coefficients. In Figure 2, we plot the standard errors of $\hat{\beta}_{C}$ and $\hat{\beta}_{ELW}$ for $\beta_{1}$ , $\beta_{21}$ , $\beta_{22}$ and $\beta_{23}$ at various quantile levels. The standard error of $\hat{\beta}_{ELW}$ is smaller than that of $\hat{\beta}_{C}$ in most cases.

5 Conclusions

In this paper, we develop weighted empirical likelihood approach for estimating the conditional quantile functions in linear models with nonignorable missing covariates. By incorporating the unbiased estimating equations of incomplete data into empirical likelihood, the ELW estimator can achieve semiparametric efficiency if the probability of missingness is correctly specified. We will extend the proposed methods to other regression models, which will be investigated in the future work.

Acknowledgements

Xiaohui Yuan was partly supported by the NSFC (No.11401048, 11671054,11701043). Xiaogang Dong was partly supported by the NSFC (No. 11571051).

6 Appendix

In the section, we list a preliminary lemma which has been used in the proofs of the main results in section 2.

Lemma 6.1

Under conditions C1-C5, we have

[TABLE]

where $\lambda(\theta)$ is defined in (7).

**The proof of Lemma 6.1 ** By Lemma A.2 in Liu and Yuan (2016), we have

[TABLE]

where $U_{g}(\theta)=\sum_{i=1}^{n}g(\delta_{i},X_{i},Z_{i},Y_{i},\theta)$ . By a Taylor expansion,

[TABLE]

where $\tilde{\theta}$ is a point on the segment connecting $\hat{\theta}$ and $\theta^{*}$ . By the law of large numbers, we have

[TABLE]

By the asymptotic properties of maximum likelihood estimate, we have

[TABLE]

where $U(\gamma^{*})=\sum_{i=1}^{n}U_{B}(\delta_{i},Z_{i},Y_{i},\gamma^{*})$ . Thus by (6) and (11),

[TABLE]

The desired result follows.

**The proof of Theorem 2.1 ** The proof is similar to the proof of Theorem 4.1 in Koenker (2005, page 120).

** The proof of Theorem 2.2 ** For $i=1,\cdots,n$ , let

[TABLE]

where $\varepsilon_{i}=Y_{i}-W_{i}^{T}\beta^{*}$ . The function $A(\eta)=\sum_{i=1}^{n}n\hat{p}_{i}\delta_{i}A_{i}(\eta)$ is convex and is minimized at $\hat{\eta}=\sqrt{n}(\hat{\beta}_{ELW}-\beta^{*})$ . Following Knight’s identity (Knight,1998)

[TABLE]

we can write $A(\eta)=A_{1}(\eta)+A_{2}(\eta),$ where

[TABLE]

We first give the asymptotic expression of (12). Applying a Taylor expansion, we get

[TABLE]

By the law of large numbers, we have

[TABLE]

By Lemma 6.1,

[TABLE]

where $U_{\phi}(\theta^{*})=\sum_{i=1}^{n}\delta_{i}\phi(X_{i},Z_{i},Y_{i},\beta^{*})$ .

Next, we give the asymptotic expression of (13). A Taylor expansion reveals that

[TABLE]

Moreover, similar to the proof of Theorem 4.1 in Koenker(2005), one can show that

[TABLE]

Thus, we only need to show that $\sum_{i=1}^{n}A_{2i}(\eta)\delta_{i}g^{T}(\delta_{i},X_{i},Z_{i},Y_{i},\theta^{*})\hat{\lambda}$ is asymptotically negligible. By Lemma 6.1 and Lemma D.2 in Kitamura et al. (2004), we have $\|\hat{\lambda}\|=O_{p}(n^{-1/2})$ and $\max_{1\leq i\leq n}\{\|g(\delta_{i},X_{i},Z_{i},Y_{i},\theta^{*})\|\}=o_{p}(n^{1/2})$ . Then,

[TABLE]

By the asymptotic expressions of (12) and (13), we conclude that $A(\eta)\stackrel{{\scriptstyle d}}{{\longrightarrow}}A_{0}(\eta)$ , where

[TABLE]

Then, it follows that

[TABLE]

where

[TABLE]

Furthermore, by simple algebra, one can verify that

[TABLE]

and

[TABLE]

Therefore,

[TABLE]

Let

[TABLE]

One can write $\arg\min_{\eta}A_{0}(\eta)$ as $-F_{\beta}^{-1}n^{-1/2}\sum_{i=1}^{n}\left\{h_{1i}-V_{1}V_{2}^{-1}h_{2i}\right\}$ . It is easily verified that $Var\left(h_{1i}\right)=E(h_{1i}h_{1i}^{T})=S_{\phi},$

[TABLE]

and

[TABLE]

Thus,

[TABLE]

The desired result follows by the central limit theorem.

** The proof of Theorem LABEL:el ** According to the proof of Theorem 1 of Lopez et al.(2009), it can be shown that

[TABLE]

where $\Sigma_{EL}=\left(S_{1}^{T}S_{2}^{-1}S_{1}\right)^{-1}$ , $S_{1}=\left(\begin{array}[]{cc}F_{\beta}&0\\ 0&D_{2}\\ 0&S_{B}\end{array}\right)$ and $S_{2}=\left(\begin{array}[]{ccc}S_{\phi}&D_{3}&D_{4}\\ D_{3}^{T}&D_{1}&D_{2}\\ D_{4}^{T}&D_{2}^{T}&S_{B}\end{array}\right).$

Let $E_{22}=\left(\begin{array}[]{cc}D_{1}&D_{2}\\ D_{2}^{T}&S_{B}\end{array}\right)$ , then we write $S_{2}=\left(\begin{array}[]{cc}S_{\phi}&F_{g}\\ F_{g}^{T}&E_{22}\end{array}\right)$ , where $F_{g}$ is defined in (14) We know that

[TABLE]

with $E_{11.2}=S_{\phi}-F_{g}E_{22}^{-1}F_{g}^{T}=S_{\phi}-V_{1}V_{2}^{-1}V_{1}^{T}-D_{4}S_{B}^{-1}D_{4}^{T}$ . Note that $S_{1}^{T}S_{2}^{-1}S_{1}$ can be written as

[TABLE]

where $H_{11}=F_{\beta}E_{11.2}^{-1}F_{\beta}$ , $H_{12}=-F_{\beta}E_{11.2}^{-1}D_{4}$ , $H_{21}=H_{12}^{T}$ and $H_{22}=S_{B}+D_{4}^{T}E_{11.2}^{-1}D_{4}$ . Therefore, we have

[TABLE]

where $H_{22.1}=H_{22}-H_{21}H_{11}^{-1}H_{12}=S_{B}.$ By direct calculation, it follows that

[TABLE]

and $-H_{11}^{-1}H_{12}H_{22.1}^{-1}=F_{\beta}^{-1}E_{11.2}F_{\beta}^{-1}F_{\beta}E_{11.2}^{-1}D_{4}S_{B}^{-1}=F_{\beta}^{-1}D_{4}S_{B}^{-1}.$ The desired result follows.

Reference

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Bartlett J W, Carpenter J R, Tilling K, et al. 2014. Improving upon the efficiency of complete case analysis when covariates are MNAR[J]. Biostatistics, 15(4): 719-730.
2(2) Chen X, Wan A T K, Zhou Y. 2015. Efficient quantile regression analysis with missing observations[J]. Journal of the American Statistical Association, 110(510):00-00.
3(3) Kitamura Y, Tripathi G, Ahn H. 2004. Empirical likelihood-based inference in conditional moment restriction Models[J]. Econometrica, 72(6): 1667-1714.
4(4) Knight K. 1998. Limiting distributions for L 1 subscript 𝐿 1 L_{1} regression estimators under general conditions[J]. Annals of Statistics, 26: 755-770.
5(5) Koenker, R. and Bassett, G. 1978. Regression quantiles. Econometrica,46, 33-50.
6(6) Koenker R. 2005. Quantile regression[M]. Cambridge university press.
7(7) Little RJA, Rubin DB 2002 Statistical analysis with missing data, 2nd ed, Wiley, Hoboken, NJ.
8(8) Little, R.J., Zhang, N. 2011. Subsample ignorable likelihood for regression analysis with missing data. J R Stat Soc Ser C 60: 591-605.