Nearly Semiparametric Efficient Estimation of Quantile Regression

Kani Chen; Yuanyuan Lin; Zhanfeng Wang; Zhiliang Ying

arXiv:1705.09599·stat.ME·May 29, 2017

Nearly Semiparametric Efficient Estimation of Quantile Regression

Kani Chen, Yuanyuan Lin, Zhanfeng Wang, Zhiliang Ying

PDF

Open Access

TL;DR

This paper develops a nearly semiparametric efficient estimator for multiple quantile regression models, improving efficiency by pooling information across quantiles with a feasible, easy-to-implement method that outperforms traditional estimators.

Contribution

It introduces a one-step nearly semiparametric efficient estimator for multiple quantile levels, leveraging the least favorable submodel technique for improved efficiency.

Findings

01

The proposed estimator achieves the semiparametric efficiency lower bound.

02

Numerical studies show higher efficiency than the Koenker-Bassett estimator.

03

The method is computationally feasible and easy to implement.

Abstract

As a competitive alternative to least squares regression, quantile regression is popular in analyzing heterogenous data. For quantile regression model specified for one single quantile level $τ$ , major difficulties of semiparametric efficient estimation are the unavailability of a parametric efficient score and the conditional density estimation. In this paper, with the help of the least favorable submodel technique, we first derive the semiparametric efficient scores for linear quantile regression models that are assumed for a single quantile level, multiple quantile levels and all the quantile levels in $(0, 1)$ respectively. Our main discovery is a one-step (nearly) semiparametric efficient estimation for the regression coefficients of the quantile regression models assumed for multiple quantile levels, which has several advantages: it could be regarded as an optimal way to pool…

Tables1

Table 1. Table 1: Simulation results for five models with quantiles 0.5 and 0.7.

			$τ = 0.5$		$τ = 0.7$
Model	$n$		$β_{1} (τ)$	$β_{2} (τ)$	$β_{1} (τ)$	$β_{2} (τ)$
M1		True	2	1	2	1.5244
	1000	TQE	2.0007(0.0512)	0.9974(0.0899)	2.0031(0.0547)	1.5195(0.0961)
		SEF	2.0009(0.0238)	0.9968(0.0547)	2.0050(0.0265)	1.5149(0.0560)
		EFF	2.0015(0.0227)	0.9959(0.0533)	2.0009(0.0247)	1.5200(0.0529)
	2000	TQE	1.9992(0.0365)	1.0010(0.0652)	2.0023(0.0370)	1.5213(0.0653)
		SEF	2.0002(0.0159)	0.9993(0.0361)	2.0034(0.0174)	1.5190(0.0376)
		EFF	2.0002(0.0145)	0.9992(0.0352)	2.0006(0.0150)	1.5224(0.0365)
M2		True	2	2	2.5244	2.5244
	1000	TQE	1.9976(0.1192)	1.9987(0.1155)	2.5240(0.1244)	2.5206(0.1229)
		SEF	1.9989(0.0896)	1.9981(0.0875)	2.5228(0.0891)	2.5209(0.0903)
		EFF	1.9985(0.0881)	1.9982(0.0870)	2.5239(0.0883)	2.5205(0.0881)
	2000	TQE	1.9980(0.0834)	2.0022(0.0844)	2.5230(0.0877)	2.5225(0.0833)
		SEF	1.9990(0.0617)	2.0003(0.0614)	2.5232(0.0631)	2.5236(0.0605)
		EFF	1.9988(0.0608)	2.0002(0.0608)	2.5240(0.0624)	2.5228(0.0602)
M3		True	2	1	2	1.8473
	1000	TQE	2.0011(0.0822)	0.9958(0.1437)	2.0055(0.0907)	1.8397(0.1592)
		SEF	2.0014(0.0381)	0.9949(0.0874)	2.0094(0.0445)	1.8305(0.0929)
		EFF	2.0021(0.0365)	0.9938(0.0852)	2.0019(0.0420)	1.8400(0.0875)
	2000	TQE	1.9987(0.0585)	1.0017(0.1042)	2.0040(0.0615)	1.8424(0.1082)
		SEF	2.0003(0.0256)	0.9990(0.0575)	2.0061(0.0290)	1.8378(0.0622)
		EFF	2.0002(0.0230)	0.9990(0.0561)	2.0012(0.0250)	1.8436(0.0607)
M4		True	2	1	2	1.7265
	1000	TQE	2.0009(0.0669)	0.9966(0.1144)	2.0083(0.0930)	1.7221(0.1621)
		SEF	2.0014(0.0316)	0.9955(0.0699)	2.0166(0.0491)	1.7015(0.0952)
		EFF	2.0023(0.0287)	0.9945(0.0677)	2.0041(0.0480)	1.7172(0.0925)
	2000	TQE	1.9990(0.0469)	1.0013(0.0824)	2.0057(0.0629)	1.7228(0.1097)
		SEF	2.0002(0.0207)	0.9993(0.0461)	2.0103(0.0327)	1.7118(0.0646)
		EFF	2.0005(0.0188)	0.9988(0.0449)	2.0016(0.0289)	1.7227(0.0628)
M5		True	1	2	1.8473	2.7265
	1000	TQE	0.9964(0.1797)	1.9982(0.1555)	1.8467(0.2073)	2.7277(0.2072)
		SEF	0.9979(0.1344)	1.9972(0.1179)	1.8440(0.1488)	2.7214(0.1510)
		EFF	0.9971(0.1315)	1.9984(0.1173)	1.8449(0.1465)	2.7250(0.1474)
	2000	TQE	0.9973(0.1258)	2.003(0.1139)	1.8449(0.1459)	2.7268(0.1396)
		SEF	0.9987(0.0921)	2.0006(0.0831)	1.8448(0.1052)	2.7260(0.1011)
		EFF	0.9982(0.0911)	2.0004(0.0817)	1.8462(0.1039)	2.7264(0.1004)
^∗ Standard deviations are in parentheses.

Equations293

Q_{Y ∣ X} (τ) = X^{⊤} β_{τ},

Q_{Y ∣ X} (τ) = X^{⊤} β_{τ},

i = 1 \sum n ρ_{τ} (y_{i} - x_{i}^{⊤} β_{τ}),

i = 1 \sum n ρ_{τ} (y_{i} - x_{i}^{⊤} β_{τ}),

Y = X^{⊤} β_{τ} + ϵ_{τ},

Y = X^{⊤} β_{τ} + ϵ_{τ},

Q_{Y ∣ X} (τ) = X^{⊤} β (τ), \mbox f or a l l τ \in (0, 1),

Q_{Y ∣ X} (τ) = X^{⊤} β (τ), \mbox f or a l l τ \in (0, 1),

Q_{Y ∣ X} (τ_{l}) = X^{⊤} β (τ_{l}), \mbox f or a l l l = 1, 2, \dots, L,

Q_{Y ∣ X} (τ_{l}) = X^{⊤} β (τ_{l}), \mbox f or a l l l = 1, 2, \dots, L,

F_{Y ∣ X} (Q_{Y ∣ X} (τ_{l})) = τ_{l} \Rightarrow F_{Y ∣ X} (X^{⊤} β (τ_{l})) = τ_{l}, l = 1, 2, \dots, L,

F_{Y ∣ X} (Q_{Y ∣ X} (τ_{l})) = τ_{l} \Rightarrow F_{Y ∣ X} (X^{⊤} β (τ_{l})) = τ_{l}, l = 1, 2, \dots, L,

\tilde{F}_{Y ∣ X} (t; θ) = F_{Y ∣ X} (t) + θ G_{Y ∣ X} (t),

\tilde{F}_{Y ∣ X} (t; θ) = F_{Y ∣ X} (t) + θ G_{Y ∣ X} (t),

\tilde{f}_{Y ∣ X} (t; θ) = f_{Y ∣ X} (t) + θ g_{Y ∣ X} (t),

\tilde{f}_{Y ∣ X} (t; θ) = f_{Y ∣ X} (t) + θ g_{Y ∣ X} (t),

\int_{- \infty}^{+ \infty} g_{Y ∣ X} (u) d u = 0.

\int_{- \infty}^{+ \infty} g_{Y ∣ X} (u) d u = 0.

G_{Y ∣ X} (X^{⊤} β_{0} (τ_{l})) = - f_{Y ∣ X} (X^{⊤} β_{0} (τ_{l})) X^{⊤} d (τ_{l}),

G_{Y ∣ X} (X^{⊤} β_{0} (τ_{l})) = - f_{Y ∣ X} (X^{⊤} β_{0} (τ_{l})) X^{⊤} d (τ_{l}),

S_{k} (y, x) =

S_{k} (y, x) =

σ_{k j}^{2} = \frac{1}{u _{k j}^{⊤} U u _{k j}}, j = 1, 2, \dots, p,

σ_{k j}^{2} = \frac{1}{u _{k j}^{⊤} U u _{k j}}, j = 1, 2, \dots, p,

S (y, x) = f_{Y ∣ X} (x^{⊤} β_{τ}) \frac{1}{( 1 - τ ) τ} {τ - I (y < x^{⊤} β_{τ})} D^{⊤} x .

S (y, x) = f_{Y ∣ X} (x^{⊤} β_{τ}) \frac{1}{( 1 - τ ) τ} {τ - I (y < x^{⊤} β_{τ})} D^{⊤} x .

S_{k j}^{*} (y, x) =

S_{k j}^{*} (y, x) =

\displaystyle E[\{S^{*}_{kj}(Y,X)\}^{2}]=\int_{-\infty}^{\infty}\left[\dot{\Big{\{}f_{Y|X}(X^{\top}{\bm{\beta}}(\tau))X^{\top}{\bm{d}}(\tau)\Big{\}}}\bigg{|}_{t=X^{\top}{\bm{\beta}}(\tau)}\right]^{2}dt,

\displaystyle E[\{S^{*}_{kj}(Y,X)\}^{2}]=\int_{-\infty}^{\infty}\left[\dot{\Big{\{}f_{Y|X}(X^{\top}{\bm{\beta}}(\tau))X^{\top}{\bm{d}}(\tau)\Big{\}}}\bigg{|}_{t=X^{\top}{\bm{\beta}}(\tau)}\right]^{2}dt,

S_{k j} (y, x) = l = 1 \sum L + 1 \frac{f _{Y ∣ X} ( x ^{⊤} β ( τ _{l - 1} )) x ^{⊤} d ( τ _{l - 1} ) - f _{Y ∣ X} ( x ^{⊤} β ( τ _{l} )) x ^{⊤} d ( τ _{l} )}{τ _{l} - τ _{l - 1}}

S_{k j} (y, x) = l = 1 \sum L + 1 \frac{f _{Y ∣ X} ( x ^{⊤} β ( τ _{l - 1} )) x ^{⊤} d ( τ _{l - 1} ) - f _{Y ∣ X} ( x ^{⊤} β ( τ _{l} )) x ^{⊤} d ( τ _{l} )}{τ _{l} - τ _{l - 1}}

I {x^{⊤} β (τ_{l - 1}) < y < x^{⊤} β (τ_{l})},

S_{k j} (y, x) \to S_{k j}^{*} (y, x) \mbox an d E [{S_{k j} (Y, X)}^{2}] \to E [{S_{k j}^{*} (Y, X)}^{2}]

S_{k j} (y, x) \to S_{k j}^{*} (y, x) \mbox an d E [{S_{k j} (Y, X)}^{2}] \to E [{S_{k j}^{*} (Y, X)}^{2}]

S_{k j} (y, x) =

S_{k j} (y, x) =

[τ_{k} - I {y < β (τ_{k}) x}],

E [{S_{11} (Y, X)}^{2}] = E [\frac{1}{τ _{1} ( 1 - τ _{1} )} {f_{Y ∣ X} (X^{⊤} β (τ_{1}))}^{2} {X^{⊤} d (τ_{1})}^{2}] ≐ E (Q_{1}) .

E [{S_{11} (Y, X)}^{2}] = E [\frac{1}{τ _{1} ( 1 - τ _{1} )} {f_{Y ∣ X} (X^{⊤} β (τ_{1}))}^{2} {X^{⊤} d (τ_{1})}^{2}] ≐ E (Q_{1}) .

E [{S_{11} (Y, X)}^{2}]

E [{S_{11} (Y, X)}^{2}]

=

\displaystyle+\frac{1}{\tau_{2}-\tau_{1}}\left\{f_{Y|X}(X^{\top}{\bm{\beta}}(\tau_{1}))X^{\top}{\bm{d}}(\tau_{1})-f_{Y|X}(X^{\top}{\bm{\beta}}(\tau_{2}))X^{\top}{\bm{d}}(\tau_{2})\right\}^{2}\Big{]}

≐

f_{Y ∣ X} (X^{⊤} β (τ_{l})) = \frac{1}{X ^{⊤} β ˙ ( τ _{l} )}, l = 1, 2, \dots, L .

f_{Y ∣ X} (X^{⊤} β (τ_{l})) = \frac{1}{X ^{⊤} β ˙ ( τ _{l} )}, l = 1, 2, \dots, L .

\hat{β}_{j} (τ_{k}) = \hat{β}_{j}^{c} (τ_{k}) + \overset{σ}{^}_{k j}^{2} \frac{\sum _{i = 1}^{n} S ^ _{k j} ( y _{i} , x _{i} )}{n}, j = 1, 2, \dots, p,

\hat{β}_{j} (τ_{k}) = \hat{β}_{j}^{c} (τ_{k}) + \overset{σ}{^}_{k j}^{2} \frac{\sum _{i = 1}^{n} S ^ _{k j} ( y _{i} , x _{i} )}{n}, j = 1, 2, \dots, p,

n {\hat{β}_{j} (τ_{k}) - β_{0 j} (τ_{k})} \to N (0, σ_{k j}^{2})

n {\hat{β}_{j} (τ_{k}) - β_{0 j} (τ_{k})} \to N (0, σ_{k j}^{2})

Q_{Y ∣ X} (τ) = X_{1} β_{1} (τ) + X_{2} β_{2} (τ),

Q_{Y ∣ X} (τ) = X_{1} β_{1} (τ) + X_{2} β_{2} (τ),

Y = 2 + X_{2} + X_{2} ϵ,

Y = 2 + X_{2} + X_{2} ϵ,

Y = 2 + 2 X_{2} + (1 + X_{2}) ϵ,

Y = 2 + 2 X_{2} + (1 + X_{2}) ϵ,

Q_{Y ∣ X} (τ_{l}) = X^{⊤} β (τ_{l}), l = 1, 2, \dots, L,

Q_{Y ∣ X} (τ_{l}) = X^{⊤} β (τ_{l}), l = 1, 2, \dots, L,

\tilde{F}_{Y ∣ X} (t; θ) = F_{Y ∣ X} (t) + θ G_{Y ∣ X} (t),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Control Systems and Identification

Full text

Nearly Semiparametric Efficient Estimation of Quantile Regression

Kani CHEN, Yuanyuan LIN, Zhanfeng WANG and Zhiliang YING

ABSTRACT: As a competitive alternative to least squares regression, quantile regression is popular in analyzing heterogenous data. For quantile regression model specified for one single quantile level $\tau$ , major difficulties of semiparametric efficient estimation are the unavailability of a parametric efficient score and the conditional density estimation. In this paper, with the help of the least favorable submodel technique, we first derive the semiparametric efficient scores for linear quantile regression models that are assumed for a single quantile level, multiple quantile levels and all the quantile levels in $(0,1)$ respectively. Our main discovery is a one-step (nearly) semiparametric efficient estimation for the regression coefficients of the quantile regression models assumed for multiple quantile levels, which has several advantages: it could be regarded as an optimal way to pool information across multiple/other quantiles for efficiency gain; it is computationally feasible and easy to implement, as the initial estimator is easily available; due to the nature of quantile regression models under investigation, the conditional density estimation is straightforward by plugging in an initial estimator. The resulting estimator is proved to achieve the corresponding semiparametric efficiency lower bound under regularity conditions. Numerical studies including simulations and an example of birth weight of children confirms that the proposed estimator leads to higher efficiency compared with the Koenker-Bassett quantile regression estimator for all quantiles of interest.

KEY WORDS: Quantile regression; Semiparametric efficient score; Least favorable submodel; One-step estimation;

INTRODUCTION

Quantile regression is a statistical methodology for the modeling and inference of conditional quantile functions. Following Koenker and Bassett (1978), we model the $\tau$ th conditional quantile function of $Y\in R$ given $X\in R^{p}$ as

[TABLE]

for certain specific $\tau\in(0,1)$ , and ${\bm{\beta}}_{\tau}$ is $p$ -vector usually including an intercept. Let $(x_{i},y_{i}),~{}i=1,2,...,n$ , be independent and identically distributed copies of $(X,Y)$ . For the $\tau$ th quantile, the classical Koenker-Bassett estimate of $\beta_{\tau}$ , denoted as $\hat{\bm{\beta}}_{\tau}^{c}$ , is obtained by minimizing the following objective function

[TABLE]

over ${\bm{\beta}}_{\tau}$ , where $\rho_{\tau}(u)=u(\tau-I(u<0))$ . The computation of $\hat{\bm{\beta}}^{c}_{\tau}$ is straightforward with the help of linear programming. There is vast literature on the estimation and inference for one or several percentile levels for model (1); see Yu and Jones (1998), He (1997), Koenker and Geling (2001), Koenker and Xiao (2002), He and Zhu (2003), Koenker (2005), Peng and Huang (2008), Peng and Fine (2009), Bondell, Reich and Wang (2010), Wang, Wu and Li (2012), Jiang, Wang and Bondell (2013), He, Wang and Hong (2013), Kato (2011, 2012), Zheng, Peng and He (2015), among many others. When there are commonality of quantile coefficients across multiple quantiles, the composite quantile regression (CQR) is proposed to combine information shared across a number of quantiles to improve estimation efficiency; see Zou and Yuan (2008), Wang and Wang (2009), Kai et al. (2001), Wang, Li and He (2012), Wang and Li (2013). But the novelty of CQR lies in the key assumption that there exist common covariate effects across multiple quantile levels. Recently, important findings in Bayesian inference for quantile regression were reported in Yang and He (2011), Kim and Yang (2011) and Feng, Chen and He (2015).

Typically, model (1) can be expressed as the following linear regression model

[TABLE]

where the $\tau$ th percentile of $\epsilon_{\tau}$ is assumed to be 0. For specific $\tau$ , under the independence assumption of $X$ and $\epsilon_{\tau}$ , it can be shown that $\hat{\bm{\beta}}_{\tau}^{c}$ is semiparametric efficient by a straightforward argument to be discussed in section 2. As a special case, when $\tau=0.5$ , the least absolute deviation (LAD) is semiparametric efficient for model (3) with the independence assumption of $X$ and $\epsilon_{\tau}$ (Zhou and Portnoy, 1998). However, we point out that, without assuming independence of $X$ and $\epsilon_{\tau}$ , $\hat{\bm{\beta}}_{\tau}^{c}$ is not semiparametric efficient and the semiparametric efficient estimation of model (1) or model(3) is indeed a sophisticated issue. The most difficult part is the estimation of the density of $\epsilon_{\tau}$ given $X$ in the semiparametric score function (Kato, 2014), which suffers from the curse of dimensionality.

When model (1) is specified for each $\tau\in(0,1)$ , following Portnoy (2003), we consider the quantile regression model

[TABLE]

where $Y$ and $X$ are the same as in model (1), and the regression parameter ${\bm{\beta}}(\tau)=(\beta_{1}(\tau),\beta_{2}(\tau),\cdots,\beta_{p}(\tau))^{T}$ is a function of $\tau$ . With the linearity assumption for all quantiles, the true unknown function ${\bm{\beta}}_{0}(\tau)$ is suffice to describe the entire conditional distribution of $Y$ given $X$ . Important results on the estimation of the quantile process with survival data can be found in Portnoy (2003), Peng and Huang (2008). Recently, there are some breakthroughs on Bayesian nonparametric regression models on all quantiles; see M $\ddot{u}$ ller & Quintana (2004), Dunson & Taylor (2005) and Chung & Dunson (2009), Reich et al.(2011), Qu & Yoon (2015), etc. To summarize, there are two main approaches for the estimation of quantile process: linear interpolation and basis representation. The linear interpolation approach consists of two steps: the first step is to estimate the quantile regression coefficients separately at certain proper grid of $\tau$ -values, and the second step is to interpolate linearly between grid values or apply rearrangement. For the basis representation method, the quantile function is represented by basis functions or some specific functions after transformation. Nevertheless, both methods reviewed above are in Bayesian framework and their theoretical properties remain unclear.

To the best of our knowledge, there is no specific construction of a semiparametric efficient estimate of ${\bm{\beta}}(\tau)$ of model (4) in the literature. We point out that for model (4), the likelihood function is $\prod_{i=1}^{n}1/\{x_{i}^{\top}\bm{\dot{\beta}}(\tau_{i})\}$ where $y_{i}=x_{i}^{\top}\bm{\beta}(\tau_{i})$ and $\dot{\bm{\beta}}(\cdot)$ is the derivative of ${\bm{\beta}}(\cdot)$ . However, the maximum likelihood method as in Zeng and Lin (2006,2007) involves enormous technical/numerical difficulty. In our view, one of the main reasons lies in the nature of model (4) that the quantile process $\bm{\beta}(\cdot)$ and the nuisance parameter $f_{Y|X}$ are not separable. The numerical maximization of the estimated likelihood subject to $n$ constrains $y_{i}=x_{i}^{\top}{\bm{\beta}}(\tau_{i})$ is rather unstable and wild. The numerical difficulties here are in the same spirit as that in numerically searching for the maximum likelihood estimation (MLE) of $\theta$ for Uniform $[0,\theta]$ , where the solutions would often go to the boundary. Moreover, due to data sparsity, the estimated $\dot{\bm{\beta}}(\tau)$ or $\dot{\bm{\beta}}(\tau)$ would be unstable when $\tau$ is close to 0 or 1.

In view of the technical/numerical complications involved in the semiparametric efficient estimation of ${\bm{\beta}}(\tau)$ in model (4), we thus take one step back and consider the following quantile regression model

[TABLE]

where $0<\tau_{1}<\tau_{2}<\cdots<\tau_{L}<1$ . Model (5) is intermediate of model (1) and model (4). With the explicit expression of the semiparametric efficient score function of ${\bm{\beta}}(\tau_{k})$ , $k=1,2,\ldots,L$ , derived by the least favorable submodel technique in section 2, we propose a one-step estimation with the estimated score function, that leads to the semiparametric efficient estimation of ${\bm{\beta}}(\tau_{k})$ . The proposed procedure is numerically doable and stable. Most importantly, one can show that when the maximum space of $\{\tau_{l}-\tau_{l-1},l=1,2,\ldots,L+1\}$ tends to 0, the semiparametric efficient score of model (5) approaches to that of model (4). As the impetus for this work was to pursue semiparametric efficient estimation of ${\bm{\beta}}(\tau)$ in model (4), theoretically, one can use efficient estimator of ${\bm{\beta}}(\tau_{k})$ with model (5) to approximate that of model (4). Hence, we refer the proposed procedure as nearly semiparametric efficient estimation for quantile regression.

The rest of the paper is organized as follows. Section 2 introduces the model and the proposed estimation with detailed discussions. Extensive simulation studies with supportive evidence are demonstrated in section 3. In section 4, the proposed method is illustrated using a real data of birth weight of children from the National Center for Health Statistics. All technical derivation and proofs are in Appendix.

METHODOLOGIES AND MAIN RESULTS

First, consider model (5), by the definition of quantile,

[TABLE]

where $F_{Y|X}$ is the cumulative distribution function of $Y$ given $X$ . Let $f_{Y|X}(t)$ be the density function of $Y$ conditional on $X$ . Let ${\bm{\beta}}_{0}(\tau_{l})=(\beta_{10}(\tau_{l}),\cdots,\beta_{p0}(\tau_{l}))^{\top}$ be the true value of ${\bm{\beta}}(\tau_{l})=(\beta_{1}(\tau_{l}),\cdots,\beta_{p}(\tau_{l}))^{\top}$ . By the nature of quantile regression model, $x^{\top}{\bm{\beta}}(\tau_{l})$ is $\tau_{l}$ -quantile of $Y$ given $X=x$ . Without loss of generality, we assume that $x^{\top}{\bm{\beta}}(\tau_{1})<x^{\top}{\bm{\beta}}(\tau_{2})<\cdots<x^{\top}{\bm{\beta}}(\tau_{L})$ .

*2.1. Semiparametric efficient scores. *

In quantile regression, estimation of the quantile regression coefficient or the quantile process is inseparably linked to the nuisance parameter, the conditional density function. In such a case, the least favorable submodel method (Kato, 2014) plays a role to derive a semiparametric efficient score function of ${\bm{\beta}}(\tau_{l}),~{}l=1,...,L$ of model (5) and their variance lower bound. It is known that the least favorable submodel technique is to reduce a high dimensional problem to a problem involving a finite-dimensional “ least favorable submodel”; see Begun et al.(1983), Bickel et al.(1993), among others. Following section 25.4 in van der Vaart (1998), we begin with the construction of a parametric submodel of model (5) based on the cumulative distribution function with parameter $\theta$ in a neighborhood of 0,

[TABLE]

where $G_{Y|X}(t)$ is a function of $t$ satisfying certain conditions. Differentiating (7) we get

[TABLE]

where $\tilde{f}_{Y|X}(t;\theta)$ , $f_{Y|X}(t)$ and $g_{Y|X}(t)$ are derivatives of $\tilde{F}_{Y|X}(t;\theta)$ , $F_{Y|X}(t)$ and $G_{Y|X}(t)$ respectively. To guarantee $\tilde{f}_{Y|X}(t;\theta)$ is a density function for all $\theta$ , the first restriction of $G_{Y|X}(t)$ is that

[TABLE]

Moreover, under model (5), let $X^{\top}{\bm{\beta}}(\tau_{l};\theta)$ be the $\tau_{l}$ quantile of $\tilde{F}_{Y|X}(t;\theta)$ and $X^{\top}{\bm{\beta}}(\tau_{l};0)=X^{\top}{\bm{\beta}}_{0}(\tau_{l})$ , for $l=1,2,\ldots,L$ . Hence, we have the identity $\tau_{l}=\tilde{F}_{Y|X}(x^{\top}{\bm{\beta}}(\tau_{l};\theta);\theta)$ . By a Taylor expansion of the right hand side of this identity as a function of $\theta$ in a neighborhood of 0, we obtain the second restriction that

[TABLE]

for $l=1,2\ldots,L$ , where ${\bm{d}}(\tau_{l})$ is the derivative of ${\bm{\beta}}(\tau_{l};\theta)$ at $\theta=0$ . Clearly, the derivative of log-likelihood of $\theta$ based on the density function $\tilde{f}_{Y|X}(t)$ at $\theta=0$ is ${g_{Y|X}(t)}/{f_{Y|X}(t)}$ , denoted as $\xi$ . By the information theory in Bickel et al.(1993), we are able to approximate the least favorable submodel by searching for the lower bound of $E(\xi^{2})$ , which as a result would lead to the semiparametric efficient score. We defer the details to Appendix I. The resulting semiparametric efficient score of ${\bm{\beta}}(\tau_{k})$ can be regarded as an optimal way to combine information from all the quantile levels $\tau_{1},...,\tau_{L}$ .

Let ${\bm{U}}={\bm{B}}{\bm{A}}{\bm{B}}^{\top}$ and ${\bm{W}}$ be a $pL\times pL$ diagonal matrix with diagonal elements being the reciprocal of diagonal of matrix ${\bm{U}}^{-1}$ , where ${\bm{A}}$ and ${\bm{B}}$ are defined in (A.18) and (A.24) in Appendix I. Set $({\bm{u}}_{1},{\bm{u}}_{2},\cdots,{\bm{u}}_{pL})={\bm{U}}^{-1}{\bm{W}}$ , where ${\bm{u}}_{i}$ is a vector with length $pL$ . The following proposition presents the semiparametric efficient score of ${\bm{\beta}}(\tau_{k})$ , $1\leq k\leq L$ and their variance lower bound.

Proposition 1. For model (5), the semiparametric efficient score of ${\bm{\beta}}(\tau_{k})$ , $1\leq k\leq L$ , is

[TABLE]

Moreover, for the estimate of the $j-$ th component of ${\bm{\beta}}(\tau_{k})$ , its variance has a lower bound

[TABLE]

*where ${\bm{D}}_{l}$ is $p\times p$ matrix, $l=0,1,\ldots,L+1$ , ${\bm{D}}_{0}={\bm{D}}_{L+1}=0$ , $({\bm{D}}_{1},{\bm{D}}_{2},\ldots,{\bm{D}}_{L})=({\bm{u}}_{k1},{\bm{u}}_{k2},\ldots,{\bm{u}}_{kp})^{\top}$ ; $f_{Y|X}(x^{\top}{\bm{\beta}}_{0}(0))=f_{Y|X}(x^{\top}{\bm{\beta}}_{0}(1))=0$ ; ${\bm{\beta}}_{0}(0)=-\infty$ , ${\bm{\beta}}_{0}(1)=+\infty$ ; $\tau_{0}=0$ and $\tau_{L+1}=1$ . *

Remark 1. When $L=1$ , model (5) reduces to model (1) for one single quantile point $\tau$ . By Proposition 1, the semiparametric efficient score of ${\bm{\beta}}_{\tau}$ in model (1) is

[TABLE]

By the definitions of ${\bm{U}}$ and ${\bm{W}}$ , ${\bm{D}}={\bm{U}}^{-1}{\bm{W}}$ is a constant matrix not depending on random variable $X$ . For the corresponding linear model (3), under the assumption that the $\tau$ -quantile of $\epsilon_{\tau}$ is [math] and the error term $\epsilon_{\tau}$ is independent of covariate $X$ , $f_{Y|X}(x^{\top}{\bm{\beta}}_{\tau})=f_{Y-X^{\top}{\bm{\beta}}_{\tau}|X}(0)=f_{\epsilon_{\tau}|X}(0)$ is also not relevant to $X$ . In this case, the efficient score in (10) is exactly the efficient score in classical quantile regression model specified at a single quantile level, such as the least absolute deviation estimate (LAD) for median regression; see Zhou and Portnoy (1998) and Kato (2014). However, without the crucial independence assumption of $X$ and $\epsilon_{\tau}$ , as conventional quantile regression models allows heterogeneity, the distribution of $\epsilon_{\tau}$ depends on $X$ implying $f_{Y|X}(x^{\top}{\bm{\beta}}_{\tau})$ also depends on $x$ . As a result, the Koenker-Bassett estimate is not semiparametric efficient.

Remark 2. When $L\rightarrow\infty$ and the maximum space of $\{\tau_{l}-\tau_{l-1},l=1,2,\ldots,L+1\}$ tends to 0, model (5) approaches model (4). Next, we intend to show that the semiparametric efficient score (9) of ${\bm{\beta}}(\tau_{k})$ approaches that of model (4) as $L\to\infty$ . In fact, for the $j$ -th component of ${\bm{\beta}}(\tau_{k})$ , a similar calculation as that of (9) reveals that semiparametric efficient score of ${\beta}_{j}(\tau_{k})$ in model (4) is

[TABLE]

where ${\bm{d}}(\tau)=[d_{1}(\tau),...,d_{p}(\tau)]^{\top}$ is a minimizer of

[TABLE]

subject to ${d}_{j}(\tau_{k})=1$ . We defer the detailed derivations of this finding in Appendix II. We point out that, it is infeasible to pursue the semiparametric efficient estimation of ${\beta}_{j}(\tau_{k})$ in model (4) based on (11), as the numerical minimization of (12) is intractable. Fortunately, the semiparametric efficient score of ${\beta}_{j}(\tau_{k})$ in (9) can be rewritten as,

[TABLE]

where ${\bm{d}}=[{\bm{d}}(\tau_{1})^{\top},...,{\bm{d}}(\tau_{L})^{\top}]^{\top}$ is a minimizer of the quadratic form $E[\{S_{kj}(Y,X)\}^{2}]\equiv{\bm{d}}^{\top}{\bm{B}}{\bm{A}}{\bm{B}}^{\top}{\bm{d}}$ subject to $d_{j}(\tau_{k})=1$ . It is straightforward to check that

[TABLE]

as $L\to\infty$ . This finding motivates us to use the efficient score in (9) to approximate the efficient score in (11), which leads to a nearly semiparametric efficient estimator of ${\bm{\beta}}(\tau_{k})$ in model (4).

Remark 3. The key idea of this work is to borrow information across quantiles and search for the most efficient estimation. This remark provides more insights in this idea. Intuitively, for certain quantile level $\tau_{k}$ , the estimation of ${\bm{\beta}}(\tau_{k})$ in traditional quantile regression does not depend on the information on $Y$ at other quantiles $\{\tau_{i},~{}i\neq k\}$ , especially those quantiles far away from $\tau_{k}$ . The intuition is true when the number of covariates (including an intercept term) is 1, that is $p=1$ . For this special case, one can rewrite (13) as

[TABLE]

from which one can see that $S_{kj}(y,x)$ is not relevant to the model information at other quantiles $\{\tau_{l},l\neq k\}$ . Appendix III contains the proofs of $(\ref{rm3})$ . In other words, for model (5) with $p=1$ , the semiparametric efficiency for the estimation of $\beta_{j}(\tau_{k})$ can be achieved using only the information at $\tau_{k}$ . However, besides an intercept, there is generally at least one covariate in the model, namely $p\geq 2$ . Hence, the efficient estimator of $\beta_{j}(\tau_{k})$ generally depends on the information at other quantiles. In view of this fact, borrowing information across other quantiles via the efficient score $(\ref{rm3})$ is able to improve the estimation efficiency of ${\bm{\beta}}(\tau_{k})$ when $p\geq 2$ . In addition, Proposition 1 tells that the variance of estimates of $\beta_{j}(\tau_{k})$ have a lower bound $\sigma_{kj}^{2}$ .

For illustration, we consider a toy example for model (5) with $L=2$ . To estimate $\beta_{1}(\tau_{1})$ , if we use only the model information at single quantile $\tau_{1}$ and ignore the information at $\tau_{2}$ , then

[TABLE]

On the other hand, by incorporating the model information at $\tau_{2}$ for the estimation of $\beta_{1}(\tau_{1})$ , we have shown in Appendix IV that

[TABLE]

Most importantly, we have shown $Q_{2}-Q_{1}>0$ which leads to $E(Q_{2})-E(Q_{1})>0$ . In summary, our theoretical analysis validates that combining information across quantiles can generally reduce the variance of the estimate of ${\bm{\beta}}(\tau_{k})$ .

2.2. The nearly semiparametric efficient estimation.

In this subsection, we introduce the proposed nearly semiparametric efficient estimation procedure for the regression coefficients of mode (4). As discussed earlier, we make use of the score (9) in the construction of the proposed estimator. Since (9) involves the density function of $Y$ given $X$ , we need to find an appropriate estimate of $f_{Y|X}(x^{\top}{\bm{\beta}}(\tau_{l}))$ , $l=1,2,\ldots,L$ . Recall that

[TABLE]

Hence, instead of estimating the conditional density function directly, we estimate $\dot{\bm{\beta}}(\tau_{l})$ . A natural estimate of $\dot{\bm{\beta}}(\tau_{l})$ is $\hat{\dot{\bm{\beta}}}(\tau_{l})=\{\hat{\bm{\beta}}^{c}(\tau_{l}+h)-\hat{\bm{\beta}}^{c}(\tau_{l}-h)\}/(2h)$ , where $\hat{\bm{\beta}}^{c}(\tau)$ is the Koenker-Bassett estimate of ${\bm{\beta}}(\tau)$ by minimizing (2) and $h$ is the bandwidth. Thus, the density function $f_{Y|X}(X^{\top}{\bm{\beta}}(\tau_{l}))$ can be estimated by ${1}/{X^{\top}{\hat{\dot{\bm{\beta}}}}(\tau_{l})}$ for $l=1,2,\ldots,L$ . Next, we define the proposed one-step estimator of ${\bm{\beta}}(\tau_{k})$ , denoted by $\hat{\bm{\beta}}(\tau_{k})$ , as

[TABLE]

where $\hat{S}_{kj}(y,x)$ is the $j$ -th component of the estimated score $\hat{\bm{S}}_{k}(y,x)$ by plugging $\hat{\dot{\bm{\beta}}}(\tau_{l})$ and $\hat{\bm{\beta}}^{c}(\tau_{l})$ , $l=1,\ldots,L$ , into (9), $\hat{\sigma}_{kj}^{2}$ is the estimated variance lower bound by plugging $\hat{\dot{\bm{\beta}}}(\tau_{l})$ and $\hat{\bm{\beta}}^{c}(\tau_{l})$ , $l=1,\ldots,L$ , into $\sigma_{kj}^{2}$ in Proposition 1. Under regularity conditions given in Appendix V, the resulting estimate of $\hat{\beta}_{j}(\tau_{k})$ can be proved to achieve the semiparametric efficiency lower bound. The following theorem presents the main results.

Theorem 1. Assume model (4) and conditions $(1)-(3)$ in Appendix V hold. Then, for $j=1,2,\ldots,p$ and $k=1,2,\ldots,L$ ,

[TABLE]

*in distribution as $n\to\infty$ , where ${\beta}_{0j}(\tau_{k})$ is the $j$ -th component of ${\bm{\beta}}_{0}(\tau_{k})$ . Moreover, the asymptotic variance of $\hat{\beta}_{j}(\tau_{k})$ achieves the semiparametric efficiency bound $\sigma_{kj}^{2}$ . *

The implementation of the one-step estimation is as follows: for each $k=1,\cdots,L$ , $j=1,\cdots,p$ ,

*Step 1. * For each $l=1,\cdots,L$ , compute the initial estimator $\hat{\bm{\beta}}^{c}(\tau_{l})$ ;

Step 2. For each $l=1,\cdots,L$ , calculate ${\hat{\dot{\bm{\beta}}}(\tau_{l})}$ and the conditional density function $f_{Y|X}({x_{i}^{\top}\bm{\beta}}(\tau_{l}))$ is ${1}/{x_{i}^{\top}\hat{\dot{\bm{\beta}}}(\tau_{l})}$ ;

*Step 3. * Compute $\hat{S}_{kj}(y,x)$ and $\hat{\sigma}_{kj}^{2}$ by plugging the initial estimator in step 1 and the estimated density in step 2 into $S_{kj}(y,x)$ and $\sigma_{kj}^{2}$ ;

Step 4. Obtain $\hat{\beta}_{j}(\tau_{k})$ according to (18).

Remark 4. Actually, in the above one-step estimation, we only need to estimate the conditional density function $f_{Y|X}(x^{\top}{\bm{\beta}}(\tau_{l}))$ at quantile levels $\{\tau_{l},l=1,\ldots,L\}$ . In this regard, we only need to assume the linear quantile regression model is specified in a neighborhood of each $\tau_{l}$ , $l=1,\ldots,L$ , and do not need to assume a linear quantile regression model for all $\tau\in(0,1)$ .

SIMULATION STUDIES

Simulations are conducted to evaluate the performance of our proposed method. In the simulation, for a quantile level $\tau_{k}$ of interest, we consider three methods for the estimation of $\hat{\beta}_{j}(\tau_{k})$ : the Koenker-Bassett quantile estimate $\hat{\bm{\beta}}_{\tau}^{c}$ , denoted by TQE; the proposed one-step estimate based on the semiparametric efficient score of ${\bm{\beta}}(\tau_{k})$ , referred as EFF; the one-step estimate based on the score function (10) ignoring the model information at other quantiles, referred as (SEF). The simulated data is generated from the following quantile regression model with two covariates,

[TABLE]

where $\beta_{1}(\tau)$ and $\beta_{2}(\tau)$ takes each of the following 5 forms:

$M1:$ $\beta_{1}(\tau)=2$ and $\beta_{2}(\tau)=1+\Phi^{-1}(\tau)$ ;

$M2:$ $\beta_{1}(\tau)=2+\Phi^{-1}(\tau)$ and $\beta_{2}(\tau)=2+\Phi^{-1}(\tau)$ ;

$M3:$ $\beta_{1}(\tau)=2$ and $\beta_{2}(\tau)=1+\log\{\tau/(1-\tau)\}$ ;

$M4:$ $\beta_{1}(\tau)=2$ and $\beta_{2}(\tau)=1+\tan\{\pi*(\tau-0.5)\}$ ;

$M5:$ $\beta_{1}(\tau)=1+\log\{\tau/(1-\tau)\}$ and $\beta_{2}(\tau)=2+\tan\{\pi*(\tau-0.5)\}$ .

The covariate $X_{1}$ is constant $1$ for $M1$ , $M3$ and $M4$ , and it follows log-normal distribution for $M2$ and $M5$ . Another covariate $X_{2}$ follows log-normal distribution for all cases. In particular, model (20) with cases $M1$ and $M2$ are equivalent to

[TABLE]

and

[TABLE]

respectively, where $\epsilon$ follows the standard normal distribution. The sample size $n=1000$ and 2000. All simulations are repeated 1000 times.

We first consider the two quantiles $0.5$ and $0.7$ . The simulation results are summarized in Table 1. One can see that the parameter estimates are generally unbiased. In all configurations, EFF has the smallest standard deviation (SD) compared with TQE and SEF. And SEF have much smaller SD compared to TQE. For example, for case M3 and $n=1000$ , the ratio of the standard deviations of TQE and EFF ranges from $1.343$ to $2.214$ . And the ratio of the standard deviations of SEF and EFF ranges from $1.026$ to $1.062$ . In other words, EFF improves efficiency of TQE for at least $80\%$ and it improves efficiency of the SEF for around $5\%$ to 12%, which confirms our theoretical findings.

In addition, we also compare the numerical performance of the three methods with quantiles $0.5$ and $0.9$ , a higher quantile. Table 2 reports the estimation results for the 5 cases, from which similar conclusion to that of $\tau=0.5$ and $0.7$ can be drawn. Specially, EFF has the smallest standard erros and SEF is more efficient than TQE. This confirms the theory that, if a higher quantile is of particular interest, it is beneficial to combine the model information across other quantile levels, for example, some moderate quantile $\tau=0.5$ , for more efficient and stable estimation.

APPLICATION

We apply the proposed method to analyze a birth data (birth) released annually by the National Center for Health Statistics. The data includes information on nearly all live births from United States. Education of mother of each birth is recorded as 5 classes based on years of education. For illustration, we only consider the births that occurred in the month of June, 1997, and had mothers with smoking cigarettes and education class 2 (7 to 11 years of education). There are 9832 birth children consisting of 4861 female and 4971 male. In this paper, our interest is to study the relationship of the birth weight of child (in grams) and the covariates: the age of mother (Mage), the age of father (Fage) and the total number of prenatal care visits (Nprevist). All variables are taken the logarithmic transformation before analysis. We apply model (5) with $\tau=0.3,0.5,0.7$ for analyzing the dataset. Tables 3-4 present the estimation results of regression coeffecients by TQE, SEF and EFF, which are defined the same as in section 3. In Tables 3-4, Est represents the parameter estimate, Esd is the variance estimate of Est by $1000$ boostrap resampling method and the $P$ -value is computed by $1-\Phi(|Est/Esd|)$ where $\Phi(\cdot)$ is the cumulative distribution function of the standard normal distribution.

It can be seen that at nominal significance level 0.05, all the three methods detect Nprevist for all quantiles, detect ages of parents at $\tau=0.3$ and 0.5. And at $\tau=0.7$ , the three methods identify father age of the female children data. However, one significant finding in the analysis is that at $\tau=0.7$ , Fage and Mage of the male children data do not have significantly nonzero coefficients, however, for female data, Mage is only detected by EFF with a significant nonzero coefficients, while TQE and SEF do not detect this. Overall, Tables 3-4 report that Nprevist and ages of parents have positive and negative coefficients, respectively, which suggests that the birth weights of children become heavier when their mothers are younger and have more prenatal care visits. In addition, the effect of the three covariates to the birth weights of children are more significant at lower quantile ( $\tau=0.3$ ) compared with that of higher quantile ( $\tau=0.7$ ).

Bibliography75

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1]
2[2] Begun, J. M., Hall, W. J., Huang, W. M. and Wellner, J. A. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 11 , 432-452.
3[3]
4[4] Bickel, P. J., Klaassen, C. A., Ritov, Y. and Wellner, J. A. (1993). Efficient and adaptive estimation for semiparametric models. Baltimore: Johns Hopkins University Press.
5[5]
6[6] Bondell, H. D., Reich, B. J. and Wang, H. (2010). Noncrossing quantile regression curve estimation. Biometrika , 97 , 825-838.
7[7]
8[8] Chung, Y. and Dunson, D. B. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. J. Amer. Statist. Assoc. 104 , 1646-1660.