On confidence intervals centered on bootstrap smoothed estimators

Paul Kabaila; Christeen Wijethunga

arXiv:1903.06552·stat.ME·July 11, 2019

On confidence intervals centered on bootstrap smoothed estimators

Paul Kabaila, Christeen Wijethunga

PDF

TL;DR

This paper evaluates confidence intervals based on bootstrap smoothed estimators, extending previous work to unknown error variance, and finds conditions where these intervals perform well.

Contribution

It derives an exact formula for the standard deviation approximation of bootstrap smoothed estimators with unknown variance and assesses their performance.

Findings

01

Confidence intervals can perform well under unknown variance.

02

Performance depends on specific circumstances and model settings.

03

Extension of previous known-variance results to unknown-variance case.

Abstract

Bootstrap smoothed (bagged) estimators have been proposed as an improvement on estimators found after preliminary data-based model selection. Efron, 2014, derived a widely applicable formula for a delta method approximation to the standard deviation of the bootstrap smoothed estimator. He also considered a confidence interval centered on the bootstrap smoothed estimator, with width proportional to the estimate of this standard deviation. Kabaila and Wijethunga, 2019, assessed the performance of this confidence interval in the scenario of two nested linear regression models, the full model and a simpler model, for the case of known error variance and preliminary model selection using a hypothesis test. They found that the performance of this confidence interval was not substantially better than the usual confidence interval based on the full model, with the same minimum coverage. We…

Figures2

Click any figure to enlarge with its caption.

Equations78

\mbox{$\bm{y}$}=\bm{X}\bm{\beta}+\bm{\varepsilon}

\mbox{$\bm{y}$}=\bm{X}\bm{\beta}+\bm{\varepsilon}

θ_{\textsc P M S} = ⎩ ⎨ ⎧ θ - \frac{v _{θ τ}}{v _{τ}} τ if ∣ γ ∣ \leq t_{m} (α) θ otherwise .

θ_{\textsc P M S} = ⎩ ⎨ ⎧ θ - \frac{v _{θ τ}}{v _{τ}} τ if ∣ γ ∣ \leq t_{m} (α) θ otherwise .

\frac{1}{B}\sum_{i=1}^{B}g\big{(}\widehat{\bm{\beta}}_{i}^{*},\widehat{\sigma}_{i}^{*}\big{)}.

\frac{1}{B}\sum_{i=1}^{B}g\big{(}\widehat{\bm{\beta}}_{i}^{*},\widehat{\sigma}_{i}^{*}\big{)}.

\int_{0}^{\infty}\Big{(}\phi(d_{m}w+\gamma)-\phi(d_{m}w-\gamma)+\gamma\big{(}\Phi(d_{m}w-\gamma)-\Phi(-d_{m}w-\gamma)\big{)}\Big{)}\,f_{W}(w)\,dw,

\int_{0}^{\infty}\Big{(}\phi(d_{m}w+\gamma)-\phi(d_{m}w-\gamma)+\gamma\big{(}\Phi(d_{m}w-\gamma)-\Phi(-d_{m}w-\gamma)\big{)}\Big{)}\,f_{W}(w)\,dw,

θ = θ - ρ σ v_{θ}^{1/2} k_{m} (γ) .

θ = θ - ρ σ v_{θ}^{1/2} k_{m} (γ) .

r_{\,\rm delta}(\gamma)=\Bigg{(}\frac{\rho^{2}}{2n}\Big{(}k_{m}(\gamma)+h_{m}(\gamma)-\gamma\,q_{m}(\gamma)\Big{)}^{2}+1-2\rho^{2}q_{m}(\gamma)+\rho^{2}q_{m}^{2}(\gamma)\Bigg{)}^{1/2}.

r_{\,\rm delta}(\gamma)=\Bigg{(}\frac{\rho^{2}}{2n}\Big{(}k_{m}(\gamma)+h_{m}(\gamma)-\gamma\,q_{m}(\gamma)\Big{)}^{2}+1-2\rho^{2}q_{m}(\gamma)+\rho^{2}q_{m}^{2}(\gamma)\Bigg{)}^{1/2}.

\int_{0}^{\infty}\big{(}-d_{m}w\,\phi(d_{m}w+\gamma)-d_{m}w\,\phi(d_{m}w-\gamma)+\Phi(d_{m}w-\gamma)-\Phi(-d_{m}w-\gamma)\big{)}\,f_{W}(w)\,dw

\int_{0}^{\infty}\big{(}-d_{m}w\,\phi(d_{m}w+\gamma)-d_{m}w\,\phi(d_{m}w-\gamma)+\Phi(d_{m}w-\gamma)-\Phi(-d_{m}w-\gamma)\big{)}\,f_{W}(w)\,dw

h_{m}(\gamma)=\int_{0}^{\infty}\Big{(}(d_{m}w)^{2}\phi(d_{m}w+\gamma)-(d_{m}w)^{2}\phi(d_{m}w-\gamma)\Big{)}\,f_{W}(w)\,dw,

h_{m}(\gamma)=\int_{0}^{\infty}\Big{(}(d_{m}w)^{2}\phi(d_{m}w+\gamma)-(d_{m}w)^{2}\phi(d_{m}w-\gamma)\Big{)}\,f_{W}(w)\,dw,

J_{delta}

J_{delta}

\displaystyle=\Big{[}\widetilde{\theta}-t_{m}(\alpha)\,\widehat{\sigma}\,v_{\theta}^{1/2}\,r_{\,\rm delta}(\widehat{\gamma}),\,\widetilde{\theta}+t_{m}(\alpha)\,\widehat{\sigma}\,v_{\theta}^{1/2}\,r_{\,\rm delta}(\widehat{\gamma})\Big{]},

ℓ (h, w, ρ)

ℓ (h, w, ρ)

u (h, w, ρ)

\int_{0}^{\infty}\int_{-\infty}^{\infty}\Psi\Big{(}\ell(y+\gamma,w,\rho),u(y+\gamma,w,\rho);\rho(y),1-\rho^{2}\Big{)}\phi(y)\,dy\,f_{W}(w)\,dw,

\int_{0}^{\infty}\int_{-\infty}^{\infty}\Psi\Big{(}\ell(y+\gamma,w,\rho),u(y+\gamma,w,\rho);\rho(y),1-\rho^{2}\Big{)}\phi(y)\,dy\,f_{W}(w)\,dw,

\frac{t _{m} ( α )}{t _{m} ( 1 - c _{min} )} (\frac{m}{2})^{1/2} \frac{Γ ( m /2 )}{Γ (( m + 1 ) /2 )} \int_{0}^{\infty} \int_{- \infty}^{\infty} w r_{delta} (\frac{y + γ}{w}) ϕ (y) d y f_{W} (w) d w .

\frac{t _{m} ( α )}{t _{m} ( 1 - c _{min} )} (\frac{m}{2})^{1/2} \frac{Γ ( m /2 )}{Γ (( m + 1 ) /2 )} \int_{0}^{\infty} \int_{- \infty}^{\infty} w r_{delta} (\frac{y + γ}{w}) ϕ (y) d y f_{W} (w) d w .

\displaystyle\widehat{\bm{s}}=\left[\begin{array}[]{c}\bm{y}^{\top}\bm{y}\\ \widehat{\bm{\beta}}\end{array}\right],\quad\bm{\eta}=\left[\begin{array}[]{c}-1/(2\sigma^{2})\\ \bm{X}^{\top}\bm{X}\bm{\beta}/\sigma^{2}\end{array}\right]\quad\text{and}\quad\psi(\bm{\eta})=\frac{\bm{\beta}^{\top}\bm{X}^{\top}\bm{X}\bm{\beta}}{2\sigma^{2}}+\frac{n}{2}\log(\sigma^{2}).

\displaystyle\widehat{\bm{s}}=\left[\begin{array}[]{c}\bm{y}^{\top}\bm{y}\\ \widehat{\bm{\beta}}\end{array}\right],\quad\bm{\eta}=\left[\begin{array}[]{c}-1/(2\sigma^{2})\\ \bm{X}^{\top}\bm{X}\bm{\beta}/\sigma^{2}\end{array}\right]\quad\text{and}\quad\psi(\bm{\eta})=\frac{\bm{\beta}^{\top}\bm{X}^{\top}\bm{X}\bm{\beta}}{2\sigma^{2}}+\frac{n}{2}\log(\sigma^{2}).

{\bf sd}_{\rm delta}=\Big{(}\big{(}\mbox{$\textrm{{cov}}$}_{*}(\bm{\eta})\big{)}^{\top}\,\big{(}V(\bm{\eta})\big{)}^{-1}\,\mbox{$\textrm{{cov}}$}_{*}(\bm{\eta})\Big{)}^{1/2},

{\bf sd}_{\rm delta}=\Big{(}\big{(}\mbox{$\textrm{{cov}}$}_{*}(\bm{\eta})\big{)}^{\top}\,\big{(}V(\bm{\eta})\big{)}^{-1}\,\mbox{$\textrm{{cov}}$}_{*}(\bm{\eta})\Big{)}^{1/2},

V (η)

V (η)

\displaystyle\mbox{$\textrm{{var}}$}(\bm{y}^{\top}\bm{y})

\displaystyle\mbox{$\textrm{{var}}$}(\bm{y}^{\top}\bm{y})

= 4 σ^{2} β^{⊤} X^{⊤} X β + 2 n σ^{4} .

V (η) = σ^{2} [\vbox \raise 0.0pt \vbox]

V (η) = σ^{2} [\vbox \raise 0.0pt \vbox]

\big{(}V(\bm{\eta})\big{)}^{-1}=\frac{1}{\sigma^{2}}\left[\vbox{\hbox{\kern-1.15198pt\raise 0.0pt\hbox{\kern 151.02101pt}\kern 0.0pt\vbox{}\kern-1.15198pt}}\right].

\big{(}V(\bm{\eta})\big{)}^{-1}=\frac{1}{\sigma^{2}}\left[\vbox{\hbox{\kern-1.15198pt\raise 0.0pt\hbox{\kern 151.02101pt}\kern 0.0pt\vbox{}\kern-1.15198pt}}\right].

\displaystyle E\Big{(}\big{(}\widehat{\bm{s}}-\bm{s}\big{)}\big{(}\widehat{\theta}_{\textsc{\tiny PMS}}-\theta\big{)}\Big{)}

\displaystyle E\Big{(}\big{(}\widehat{\bm{s}}-\bm{s}\big{)}\big{(}\widehat{\theta}_{\textsc{\tiny PMS}}-\theta\big{)}\Big{)}

\displaystyle=E\Big{(}\big{(}\widehat{\bm{s}}-\bm{s}\big{)}\big{(}\widehat{\theta}-\theta\big{)}\Big{)}-\rho\,\sigma\,v_{\theta}^{1/2}\int_{0}^{\infty}\int_{-d_{m}w}^{d_{m}w}z\,E\big{(}\widehat{\bm{s}}-\bm{s}\big{|}\widetilde{\gamma}=z\big{)}\phi(z-\gamma)\,dz\,f_{W}(w)\,dw

\displaystyle E\Big{(}\big{(}\widehat{\bm{s}}-\bm{s}\big{)}\big{(}\widehat{\theta}-\theta\big{)}\Big{)}=\sigma^{2}\left[\begin{array}[]{c}2\,\theta\\ v_{\theta}\\ \rho\,v_{\theta}^{1/2}\,v_{\tau}^{1/2}\\ \bm{0}\end{array}\right]

\displaystyle E\Big{(}\big{(}\widehat{\bm{s}}-\bm{s}\big{)}\big{(}\widehat{\theta}-\theta\big{)}\Big{)}=\sigma^{2}\left[\begin{array}[]{c}2\,\theta\\ v_{\theta}\\ \rho\,v_{\theta}^{1/2}\,v_{\tau}^{1/2}\\ \bm{0}\end{array}\right]

\displaystyle E\big{(}\widehat{\bm{s}}-\bm{s}\big{|}\widetilde{\gamma}=z\big{)}

\displaystyle E\big{(}\widehat{\bm{s}}-\bm{s}\big{|}\widetilde{\gamma}=z\big{)}

\displaystyle=\left[\begin{array}[]{c}\sigma^{2}\Big{(}2\gamma(z-\gamma)+(z-\gamma)^{2}-1\Big{)}\\ \sigma(z-\gamma)\left[\begin{array}[]{c}\rho\,v_{\theta}^{1/2}\\ v_{\tau}^{1/2}\\ \bm{0}\end{array}\right]\end{array}\right].

\displaystyle\int_{0}^{\infty}\int_{-d_{m}w}^{d_{m}w}z\,E\big{(}\widehat{\bm{s}}-\bm{s}\big{|}\widetilde{\gamma}=z\big{)}\phi(z-\gamma)dz\,f_{W}(w)\,dw

\displaystyle\int_{0}^{\infty}\int_{-d_{m}w}^{d_{m}w}z\,E\big{(}\widehat{\bm{s}}-\bm{s}\big{|}\widetilde{\gamma}=z\big{)}\phi(z-\gamma)dz\,f_{W}(w)\,dw

\displaystyle\qquad\qquad\qquad\qquad=\sigma\left[\begin{array}[]{c}\sigma\Big{(}\gamma\,q_{m}(\gamma)+k_{m}(\gamma)+h_{m}(\gamma)\Big{)}\\ \rho\,v_{\theta}^{1/2}q_{m}(\gamma)\\ v_{\tau}^{1/2}q_{m}(\gamma)\\ \bm{0}\end{array}\right],

\mbox{$\textrm{{cov}}$}_{*}(\bm{\eta})=\sigma^{2}\left[\begin{array}[]{c}2\theta-\rho\,\sigma v_{\theta}^{1/2}\Big{(}\gamma\,q_{m}(\gamma)+k_{m}(\gamma)+h_{m}(\gamma)\Big{)}\\ v_{\theta}\,\big{(}1-\rho^{2}\,q_{m}(\gamma)\big{)}\\ \rho\,v_{\theta}^{1/2}v_{\tau}^{1/2}\big{(}1-q_{m}(\gamma)\big{)}\\ \bm{0}\end{array}\right].

\mbox{$\textrm{{cov}}$}_{*}(\bm{\eta})=\sigma^{2}\left[\begin{array}[]{c}2\theta-\rho\,\sigma v_{\theta}^{1/2}\Big{(}\gamma\,q_{m}(\gamma)+k_{m}(\gamma)+h_{m}(\gamma)\Big{)}\\ v_{\theta}\,\big{(}1-\rho^{2}\,q_{m}(\gamma)\big{)}\\ \rho\,v_{\theta}^{1/2}v_{\tau}^{1/2}\big{(}1-q_{m}(\gamma)\big{)}\\ \bm{0}\end{array}\right].

P (θ - t_{m} (α) sd_{delta} (γ, σ) \leq θ \leq θ + t_{m} (α) sd_{delta} (γ, σ))

P (θ - t_{m} (α) sd_{delta} (γ, σ) \leq θ \leq θ + t_{m} (α) sd_{delta} (γ, σ))

= P (- t_{m} (α) \frac{sd _{delta} ( γ , σ )}{σ v _{θ}^{1/2}} \leq \frac{θ - θ}{σ v _{θ}^{1/2}} - ρ k_{m} (γ) \leq t_{m} (α) \frac{sd _{delta} ( γ , σ )}{σ v _{θ}^{1/2}})

= P (- t_{m} (α) r_{delta} (γ) \leq \frac{G}{W} - ρ k_{m} (γ) \leq t_{m} (α) r_{delta} (γ))

\displaystyle=P\Bigg{(}-t_{m}(\alpha)\,r_{\rm delta}\left(\frac{\widetilde{\gamma}}{W}\right)\leq\frac{G}{W}-\rho\,k_{m}\left(\frac{\widetilde{\gamma}}{W}\right)\leq t_{m}(\alpha)\,r_{\rm delta}\left(\frac{\widetilde{\gamma}}{W}\right)\Bigg{)}

\displaystyle=\int_{0}^{\infty}\int_{-\infty}^{\infty}P\Bigg{(}-t_{m}(\alpha)\,r_{\rm delta}\left(\frac{\widetilde{\gamma}}{W}\right)\leq\frac{G}{W}-\rho\,k_{m}\left(\frac{\widetilde{\gamma}}{W}\right)\leq t_{m}(\alpha)\,r_{\rm delta}\left(\frac{\widetilde{\gamma}}{W}\right)\Bigg{|}

\displaystyle\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\widetilde{\gamma}=h,W=w\Bigg{)}\phi(h-\gamma)\,dh\,f_{W}(w)\,dw.

\displaystyle\int_{0}^{\infty}\int_{-\infty}^{\infty}P\Bigg{(}-t_{m}(\alpha)\,r_{\rm delta}\left(\frac{h}{w}\right)\leq\frac{G}{w}-\rho\,k_{m}\left(\frac{h}{w}\right)\leq t_{m}(\alpha)\,r_{\rm delta}\left(\frac{h}{w}\right)\Bigg{|}\widetilde{\gamma}=h\Bigg{)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On confidence intervals centered on bootstrap smoothed estimators

Paul Kabaila∗ and Christeen Wijethunga

Department of Mathematics and Statistics, La Trobe University, Australia

ABSTRACT

Bootstrap smoothed (bagged) estimators have been proposed as an improvement on estimators found after preliminary data-based model selection. Efron, 2014, derived a widely applicable formula for a delta method approximation to the standard deviation of the bootstrap smoothed estimator. He also considered a confidence interval centered on the bootstrap smoothed estimator, with width proportional to the estimate of this standard deviation. Kabaila and Wijethunga, 2019, assessed the performance of this confidence interval in the scenario of two nested linear regression models, the full model and a simpler model, for the case of known error variance and preliminary model selection using a hypothesis test. They found that the performance of this confidence interval was not substantially better than the usual confidence interval based on the full model, with the same minimum coverage. We extend this assessment to the case of unknown error variance by deriving a computationally convenient exact formula for the ideal (i.e. in the limit as the number of bootstrap replications diverges to infinity) delta method approximation to the standard deviation of the bootstrap smoothed estimator. Our results show that, unlike the known error variance case, there are circumstances in which this confidence interval has attractive properties.

Keywords: Bootstrap smoothed estimator, coverage probability, confidence interval, expected length, model selection

1. Introduction

In applied statistics there is usually some uncertainty as to which explanatory variables should be included in the model. The first attempt to deal with this ‘model uncertainty’ was to use preliminary data-based model selection employing either hypothesis tests or minimizing a criterion such as the Akaike Information Criterion (Akaike, 1974). This model selection was followed by the statistical inference of interest, based on the assumption that the selected model had been given to us a priori, as the true model. This assumption is false and typically leads to incorrect and misleading inference (see e.g. Kabaila, 2009 and Leeb and Pötscher, 2005).

Bootstrap smoothed (or bagged; Breiman, 1996) estimators have been proposed as an improvement on estimators found after preliminary data-based model selection (post-model-selection estimators). Bootstrap smoothed estimators are smoothed versions of the post-model-selection estimator. The key result of Efron (2014) is a formula for a delta method approximation, ${\bf sd}_{\rm delta}$ , to the standard deviation of the bootstrap smoothed estimator. This formula is valid for any exponential family of models and has the attractive feature that it simply re-uses the parametric bootstrap replications that were employed to find this estimator. It also has the attractive feature that it is applicable in the context of complicated data-based model selection. Kabaila and Wijethunga (2019) consider a confidence interval (CI) centered on the bootstrap smoothed estimator, with nominal coverage $1-\alpha$ , and half-width equal to the $1-\alpha/2$ quantile of the standard normal distribution multiplied by the estimate of ${\bf sd}_{\rm delta}\,$ . We call this interval the ${\bf sd}_{\rm delta}\,{\bf interval}$ .

This CI has similarities with the frequentist model averaged CIs proposed by Buckland et al. (1997), Fletcher and Turek (2011) and Turek and Fletcher (2012). All of these CIs need to have their performances, in terms of coverage probability and expected length, carefully assessed before they can be recommended for general use by applied statisticians. We believe that such assessments are best carried out through a sequence of increasingly complicated ‘test scenarios’.

The simplest test scenario consists of two nested linear regression models, where the simpler model is given by a specified linear combination of the regression parameters being set to zero. In this test scenario, the scalar parameter of interest is a distinct linear combination of the regression parameters and we assume independent and identically distributed normal errors, with error variance assumed known. Kabaila and Wijethunga (2019) provide a detailed assessment of the performance of the ${\bf sd}_{\rm delta}\,{\bf interval}$ in this test scenario if the simpler model is selected when a preliminary hypothesis test accepts the null hypothesis that this simpler model is correct. They found that, while this CI performed much better than the post-model-selection confidence interval in terms of minimum coverage probability, its performance in terms of expected length was not substantially better than the usual CI based on the full model, with the same minimum coverage.

The next simplest test scenario is the same, but with unknown error variance. Kabaila et al. (2016) and Kabaila et al. (2017) used this test scenario to provide a detailed assessment of the performance of the CIs proposed by Fletcher and Turek (2011) and Turek and Fletcher (2012). Our aim is to extend the assessment made by Kabaila and Wijethunga (2019) of the performance of the ${\bf sd}_{\rm delta}\,{\bf interval}$ to this test scenario.

We apply Theorem 2 of Efron (2014) to derive a computationally convenient exact formula for the ideal (i.e. in the limit as the number of bootstrap replications diverges to infinity) delta method approximation to the standard deviation of the bootstrap smoothed estimator. An outline of this derivation, which is quite complicated, is provided in Appendix A.1. Our computed results show that, unlike the case that the error variance is assumed known, there are circumstances in which the expected length properties of the ${\bf sd}_{\rm delta}\,{\bf interval}$ are quite attractive.

2. The two nested regression models and the post-model-selection estimator

We consider two nested linear regression models: the full model ${\cal M}_{2}$ and the simpler model ${\cal M}_{1}$ . Suppose that the full model ${\cal M}_{2}$ is given by

[TABLE]

where $\bm{y}$ is a random $n$ -vector of responses, $\bm{X}$ is a known $n\times p$ matrix with linearly independent columns ( $p<n$ ), $\bm{\beta}$ is an unknown $p$ -vector of parameters and $\bm{\varepsilon}\sim N(\bm{0},\sigma^{2}\bm{I})$ , with $\sigma^{2}$ an unknown positive parameter. Suppose that $\bm{\beta}=[\theta,\tau,\bm{\lambda}^{\top}]^{\top}$ , where $\theta$ is the scalar parameter of interest, $\tau$ is a scalar parameter used in specifying the model ${\cal M}_{1}$ and $\bm{\lambda}$ is a ( $p-2$ )-dimensional parameter vector. The model ${\cal M}_{1}$ is ${\cal M}_{2}$ with $\tau=0$ . As shown in Appendix A of Kabaila and Wijethunga (2019), this scenario can be obtained by a change of parametrization from a more general scenario. Let $m=n-p$ .

Let $\widehat{\bm{\beta}}$ denote the least squares estimator of $\bm{\beta}$ , so that $\widehat{\bm{\beta}}=(\bm{X}^{\top}\bm{X})^{-1}\bm{X}^{\top}\mbox{$ \bm{y} $}$ , and $\widehat{\sigma}^{2}=(\mbox{$ \bm{y} $}-\bm{X}\widehat{\bm{\beta}})^{\top}(\mbox{$ \bm{y} $}-\bm{X}\widehat{\bm{\beta}})/m$ . Also let $\widehat{\theta}$ and $\widehat{\tau}$ denote the first and second components of $\widehat{\bm{\beta}}$ , respectively. Now let $v_{\theta}=\mbox{$ \textrm{{var}} $}(\widehat{\theta})/\sigma^{2}$ , $v_{\tau}=\mbox{$ \textrm{{var}} $}(\widehat{\tau})/\sigma^{2}$ and $\rho=\mbox{$ \textrm{{corr}} $}(\widehat{\theta},\widehat{\tau})=v_{\theta\tau}/(v_{\theta}v_{\tau})^{1/2}$ , where $v_{\theta\tau}=\mbox{$ \textrm{{cov}} $}(\widehat{\theta},\widehat{\tau})/\sigma^{2}$ . Note that $v_{\theta}$ , $v_{\tau}$ , $v_{\theta\tau}$ and $\rho$ are known. Let $\gamma=\tau/\big{(}\sigma v_{\tau}^{1/2}\big{)}$ , which is an unknown parameter, and $\widehat{\gamma}=\widehat{\tau}/(\widehat{\sigma}{v_{\tau}}^{1/2})$ .

Suppose that we carry out a preliminary test of the null hypothesis $\tau=0$ against the alternative hypothesis $\tau\neq 0$ and that we choose the model ${\cal M}_{1}$ if this null hypothesis is accepted; otherwise we choose the model ${\cal M}_{2}$ . Let $t_{m}(a)$ be defined by $P(T\leq t_{m}(a))=1-a/2$ for $T\sim t_{m}$ . Suppose that we accept the null hypothesis when $|\widehat{\gamma}|\leq t_{m}(\widetilde{\alpha})$ ; otherwise we reject the null hypothesis. The size of this preliminary test is $\widetilde{\alpha}$ . Therefore the post-model-selection estimator of $\theta$ is equal to

[TABLE]

Henceforth, suppose that $1-\alpha$ and $\widetilde{\alpha}$ are given.

3. Computationally convenient exact formulas for the ideal bootstrap smoothed estimate and the delta method approximation to its standard deviation

The parametric bootstrap smoothed estimate of $\theta$ is obtained as follows. Note that $\widehat{\bm{\beta}}\sim N\big{(}\bm{\beta},\sigma^{2}(\bm{X}^{\top}\bm{X})^{-1}\big{)}$ and, independently, $m^{1/2}\widehat{\sigma}/\sigma\sim\chi_{m}$ (if $Q\sim\chi_{m}^{2}$ then $Q^{1/2}$ is said to have a $\chi_{m}$ distribution). To make the dependence of $\widehat{\theta}_{\textsc{\tiny PMS}}$ on $(\widehat{\bm{\beta}},\widehat{\sigma})$ explicit, write $\widehat{\theta}_{\textsc{\tiny PMS}}=g(\widehat{\bm{\beta}},\widehat{\sigma})$ . For the estimate $(\widehat{\bm{\beta}},\widehat{\sigma})$ treated as the true parameter value, suppose that $\widehat{\bm{\beta}}^{*}\sim N\big{(}\widehat{\bm{\beta}},\widehat{\sigma}^{2}(\bm{X}^{\top}\bm{X})^{-1}\big{)}$ and, independently, $m^{1/2}\widehat{\sigma}^{*}/\widehat{\sigma}\sim\chi_{m}$ . A parametric bootstrap sample of size $B$ consists of independent observations $\big{(}\widehat{\bm{\beta}}_{1}^{*},\widehat{\sigma}_{1}^{*}\big{)},\big{(}\widehat{\bm{\beta}}_{2}^{*},\widehat{\sigma}_{2}^{*}\big{)},\dots,\big{(}\widehat{\bm{\beta}}_{B}^{*},\widehat{\sigma}_{B}^{*}\big{)},$ of the random vector $\big{(}\widehat{\bm{\beta}}^{*},\widehat{\sigma}^{*}\big{)}$ . The parametric smoothed estimate of $\theta$ is defined to be

[TABLE]

The limit as the number of boostrap replications $B\rightarrow\infty$ of this quantity is called by Efron (2014) the ideal bootstrap smoothed estimate of $\theta$ . We denote this ideal boostrap smoothed estimate by $\widetilde{\theta}$ and observe that it may be obtained as follows. Let $E_{\bm{\beta},\sigma}(\widehat{\theta}_{\textsc{\tiny PMS}})$ denote the expected value of $\widehat{\theta}_{\textsc{\tiny PMS}}$ , for true parameter value $(\bm{\beta},\sigma)$ . The ideal bootstrap smoothed estimate $\widetilde{\theta}$ is obtained by first evaluating $E_{\bm{\beta},\sigma}(\widehat{\theta}_{\textsc{\tiny PMS}})$ and then replacing $(\bm{\beta},\sigma)$ by $\big{(}\widehat{\bm{\beta}},\widehat{\sigma}\big{)}$ .

Let $W=\widehat{\sigma}/\sigma$ and define $k_{m}(\gamma)$ to be

[TABLE]

where $\phi$ and $\Phi$ denote the $N(0,1)$ pdf and cdf, respectively, $d_{m}=t_{m}(\widetilde{\alpha})$ and $f_{W}$ denotes the probability density function of $W$ . As proved in Appendix B of Kabaila and Wijethunga (2019), $E_{\bm{\beta},\sigma}(\widehat{\theta}_{\textsc{\tiny PMS}})=\theta-\rho\,\sigma\,v_{\theta}^{1/2}\,k_{m}(\gamma)$ . Therefore

[TABLE]

An outline of the proof of the following new theorem is given in Appendix A.1.

Theorem 1.

An application of Theorem 2 of Efron (2014) leads to the ideal (i.e. in the limit as the number of boostrap replications $B\rightarrow\infty$ ) delta method approximation to the standard deviation of $\widetilde{\theta}$ , denoted by ${\bf sd}_{\rm delta}(\gamma,\sigma)$ , which is $\sigma v_{\theta}^{1/2}\,r_{\,\rm delta}(\gamma)$ , where

[TABLE]

Here $q_{m}(\gamma)$ is defined to be

[TABLE]

and

[TABLE]

where, as before, $d_{m}=t_{m}(\widetilde{\alpha})$ .

We expect, intuitively, that the results obtained for the case that $\sigma^{2}$ is unknown (so that it must be estimated from the data) and $m\rightarrow\infty$ should be the same as for the case that $\sigma^{2}$ is known. Suppose that $p$ is fixed and $n\rightarrow\infty$ , so that $m=n-p$ also diverges to $\infty$ . As expected, the ideal delta method approximation to the standard deviation of $\widetilde{\theta}$ given by Theorem 1 converges to the corresponding quantity given by Theorem 2 of Kabaila and Wijethunga (2019), which deals with the case that $\sigma^{2}$ is known.

4. Computationally convenient exact formula for the coverage probability of the confidence interval centered on the bootstrap smoothed estimator

Consider the CI for $\theta$ centered on the bootstrap smoothed estimator $\widetilde{\theta}$ , with nominal coverage $1-\alpha$ ,

[TABLE]

which we call the ${\bf sd}_{\rm delta}$ interval. Note that when $\rho=0$ , this CI is identical to the usual CI, with actual coverage $1-\alpha$ , based on the full model ${\cal M}_{2}$ . It may be shown that the coverage probability $P(\theta\in J_{\rm delta})$ is a function of $(\gamma,\rho)$ . We therefore denote this coverage probability by ${\rm CP}_{\rm delta}(\gamma,\rho)$ . The following theorem is proved in Appendix A.2.

Theorem 2.

Let

[TABLE]

Then ${\rm CP}_{\rm delta}(\gamma,\rho)$ is given by

[TABLE]

where $\Psi\big{(}\ell,u;\mu,v\big{)}=P\big{(}\ell\leq Z\leq u\big{)}$ for $Z\sim N(\mu,v)$ .

The expression (2) suggests that, for all sufficiently large $n$ , ${\rm CP}_{\rm delta}(\gamma,\rho)$ is determined by $m$ , for any given $(\gamma,\rho)$ . Computational results for $n=25$ (described later in this section) and $n=100$ (not described either here or in the Supporting Material) suggest that, for all $n\geq 25$ , ${\rm CP}_{\rm delta}(\gamma,\rho)$ is, for practical purposes, determined by $m$ , for any given $(\gamma,\rho)$ . It may be shown that ${\rm CP}_{\rm delta}(\gamma,\rho)$ is (a) an even function of $\gamma$ for each $\rho$ and (b) an even function of $\rho$ for each $\gamma$ . It follows that, for given $n$ and $m$ , we are able to encapsulate the coverage probability of the ${\bf sd}_{\rm delta}\,{\bf interval}$ , for all possible choices of design matrix, parameter of interest $\theta$ and parameter $\tau$ that specifies the simpler model, using only the parameters $|\rho|$ and $|\gamma|$ .

Figure 1 is the graph of coverage probability of the confidence interval $J_{\rm delta}$ centered on the bootstrap smoothed estimator, which is based on the post-model-selection estimator obtained after a preliminary hypothesis test, with size $\widetilde{\alpha}=0.1$ , of the null hypothesis that the simpler model is correct. We consider the case that the nominal coverage is 0.95, $n=25$ , $m=1$ and $|\rho|=0.2,0.5,0.7$ and 0.9. All of the computations reported in this paper were carried out using programs written in R. The minimum coverage probability of this CI is a continuous decreasing function of $|\rho|$ which equals the nominal coverage when $\rho=0$ . Graphs of the coverage probability of $J_{\rm delta}$ for the same values of nominal coverage, size of the preliminary hypothesis test, $n$ and $|\rho|$ are provided in the Supporting Material for $m=2,3$ and 10. Further extensive numerical investigations, not reported either here or in the Supporting Material, show that the ${\bf sd}_{\rm delta}\,{\bf interval}$ outperforms the post-model-selection CI, with the same nominal coverage and based on the same preliminary test, in terms of coverage probability.

5. Computationally convenient exact formula for the scaled expected length of the confidence interval centered on the bootstrap smoothed estimator

We define the scaled expected length of $J_{\rm delta}$ , with nominal coverage $1-\alpha$ , to be the expected length of $J_{\rm delta}$ divided by the expected length of the usual CI, based on the full model, with the same coverage as the minimum coverage probability of $J_{\rm delta}$ . Let $c_{\rm min}$ denote this minimum coverage probability. Now let $I(c)$ denote the usual CI for $\theta$ , with coverage probability $c$ , based on the full model. In other words, $I(c)=\Big{[}\widehat{\theta}-t_{m}(1-c)\,\widehat{\sigma}\,v_{\theta}^{1/2},\,\widehat{\theta}+t_{m}(1-c)\,\widehat{\sigma}\,v_{\theta}^{1/2}\Big{]}$ . It may be shown that the scaled expected length of $J_{\rm delta}$ is a function of $(\gamma,\rho)$ . We therefore denote this scaled expected length by ${\rm SEL}_{\rm delta}(\gamma,\rho)$ . The following theorem is proved in Appendix A.3.

Theorem 3.

Let $c_{\rm min}$ denote the minimum coverage probability of the confidence interval $J_{\rm delta}$ , with nominal coverage $1-\alpha$ . Then ${\rm SEL}_{\rm delta}(\gamma,\rho)$ is given by

[TABLE]

The expression (2) suggests that, for all sufficiently large $n$ , ${\rm SEL}_{\rm delta}(\gamma,\rho)$ is determined by $m$ , for any given $(\gamma,\rho)$ . Computational results for $n=25$ (described later in this section) and $n=100$ (not described either here or in the Supporting Material) suggest that, for all $n\geq 25$ , ${\rm SEL}_{\rm delta}(\gamma,\rho)$ is, for practical purposes, determined by $m$ , for any given $(\gamma,\rho)$ . It may be shown that ${\rm SEL}_{\rm delta}(\gamma,\rho)$ is (a) an even function of $\gamma$ for each $\rho$ and (b) an even function of $\rho$ for each $\gamma$ . It follows that, for given $n$ and $m$ , we are able to encapsulate the scaled expected length of the ${\bf sd}_{\rm delta}\,{\bf interval}$ , for all possible choices of design matrix, parameter of interest $\theta$ and parameter $\tau$ that specifies the simpler model, using only the parameters $|\rho|$ and $|\gamma|$ .

The bootstrap smoothed estimator is obtained by smoothing the post-model-selection estimator that results from a preliminary test of the null hypothesis that the simpler model is correct i.e. that $\gamma=0$ . This post-model-selection estimator is usually motivated by a desire for good performance when the simpler model is correct. Therefore, ideally, the ${\bf sd}_{\rm delta}\,{\bf interval}$ should have a scaled expected length that is substantially less than 1 when $\gamma=0$ . In addition, ideally, this confidence interval should have a scaled expected length that (a) has maximum value that is not too much larger than 1 and (b) approaches 1 as $|\gamma|$ approaches infinity.

Figure 2 is the graph of scaled expected length of the confidence interval centered on the bootstrap smoothed estimator, which is based on the post-model-selection estimator obtained after a preliminary hypothesis test, with size $\widetilde{\alpha}=0.1$ , of the null hypothesis that the simpler model is correct. We consider the case that the nominal coverage is 0.95, $n=25$ , $m=1$ and $|\rho|=0.2,0.5,0.7$ and 0.9. For $|\rho|=0.5,0.7$ and 0.9, the scaled expected length is substantially less than 1 when $\gamma=0$ . In addition, the scaled expected length (a) has maximum value that is not too much larger than 1 and (b) approaches 1 as $|\gamma|$ approaches infinity. This shows that for $m=1$ and $|\rho|\geq 0.5$ the scaled expected length of ${\bf sd}_{\rm delta}$ interval has the desired properties. This finding is similar to that reported in Kabaila and Giri (2013) concerning the performance of the CIs constructed by Kabaila and Giri (2009) to have the desired coverage probability and these desired scaled expected length properties. Namely, the performance of this CI improves as $|\rho|$ increases and $m$ decreases.

By contrast, for the case that $\sigma^{2}$ is assumed known, examined by Kabaila and Wijethunga (2019), the scaled expected length of the CI centered on the bootstrap smoothed estimator (a) is either greater than 1 or only slightly less than 1 at $\gamma=0$ and (b) has maximum value that is an increasing function of $|\rho|$ that can be much larger than 1 for large $|\rho|$ . As noted earlier, we expect that as $m$ increases (which implies that $n$ also increases), the results obtained in the present paper will approach the corresponding results obtained by Kabaila and Wijethunga (2019). Therefore we expect that as $m$ increases the ${\bf sd}_{\rm delta}$ interval will get further and further away from possessing the desired scaled expected length properties. This is confirmed by the graphs of the scaled expected length of $J_{\rm delta}$ for nominal coverage 0.95, size $\widetilde{\alpha}=0.1$ of the preliminary hypothesis test, $n=25$ and $|\rho|\in\{0.2,0.5,0.7,0.9\}$ that are provided in the Supporting Material for $m=2,3$ and 10.

6. Discussion

For the test scenario of two nested linear regression models and error variance assumed known, Kabaila and Wijethunga (2019) found that the ${\bf sd}_{\rm delta}$ interval does not perform any better in terms of expected length than the usual confidence interval, with the same minimum coverage probability and based on the full model. Intuitively, the case that the error variance is assumed to be known corresponds to the case that the error variance is unknown (so that it must be estimated) and the number of degrees of freedom $m$ for the estimation of the error variance is large.

In the present paper, we deal with the case that the error variance is unknown. We find that, for small $m$ and large magnitude of correlation between the least squares estimators of the parameter of interest and the parameter that is set to zero to specify the simpler model, the expected length of the ${\bf sd}_{\rm delta}$ interval possesses some attractive features.

Acknowledgement

This work was supported by an Australian Government Research Training Program Scholarship.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Akaike (1974) Akaike, H., 1974. A new look at statistical model identification. IEEE Transactions on Automatic Control 19, 716–723.
2Barndorff-Nielsen and Cox (1989) Barndorff-Nielsen, O.E., Cox, D.R., 1989. Asymptotic Techniques for Use in Statistics. Chapman & Hall, London.
3Barndorff-Nielsen and Cox (1994) Barndorff-Nielsen, O.E., Cox, D.R., 1994. Inference and Asymptotics. Chapman & Hall, London.
4Breiman (1996) Breiman, L., 1996. Bagging predictors. Machine Learning 24, 123–140.
5Buckland et al. (1997) Buckland, S.T., Burnham, K.P., Augustin, N.H., 1997. Model selection: an integral part of inference. Biometrics 53, 603–618.
6Efron (2014) Efron, B., 2014. Estimation and accuracy after model selection. Journal of the American Statistical Association 109, 991–1007.
7Fletcher and Turek (2011) Fletcher, D., Turek, D., 2011. Model-averaged profile likelihood intervals. Journal of Agricultural, Biological, and Environmental Statistics 17, 38–51.
8Kabaila (2009) Kabaila, P., 2009. The coverage properties of confidence regions after model selection. International Statistical Review 77, 405–414.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Theorem 1**.**

Theorem 2**.**

Theorem 3**.**

Theorem 1.

Theorem 2.

Theorem 3.