Bootstrapping for multivariate linear regression models

Daniel J. Eck

arXiv:1704.07040·math.ST·September 13, 2017

Bootstrapping for multivariate linear regression models

Daniel J. Eck

PDF

TL;DR

This paper extends bootstrap methods to multivariate linear regression models, enabling inference on the regression coefficient matrix with theoretical validation and practical examples.

Contribution

It introduces multivariate bootstrap techniques for regression inference, extending Freedman's univariate methods without requiring proof, and validates them through simulations and real data.

Findings

01

Bootstrap methods are valid for multivariate regression coefficients.

02

Simulation studies support theoretical results.

03

Real data example demonstrates practical applicability.

Abstract

The multivariate linear regression model is an important tool for investigating relationships between several response variables and several predictor variables. The primary interest is in inference about the unknown regression coefficient matrix. We propose multivariate bootstrap techniques as a means for making inferences about the unknown regression coefficient matrix. These bootstrapping techniques are extensions of those developed in Freedman (1981), which are only appropriate for univariate responses. Extensions to the multivariate linear regression model are made without proof. We formalize this extension and prove its validity. A real data example and two simulated data examples which offer some finite sample verification of our theoretical results are provided.

Tables3

Table 1. Table 1: Comparison of the 95% percentile interval and a 95% confidence interval for the first two components of vec ( β ) vec 𝛽 \text{vec}(\beta) . The number of bootstrap samples is B = 4 n 𝐵 4 𝑛 B=4n for each dataset.

	component	bootstrap	confidence
$n = 100$	$vec {(β)}_{1}$	(-0.062 0.958)	(-0.092 0.997)
	$vec {(β)}_{2}$	(-0.873 0.330)	(-0.922 0.342)
$n = 500$	$vec {(β)}_{1}$	( 0.279 0.826)	(0.256 0.823)
	$vec {(β)}_{2}$	( 0.070 0.655)	(0.074 0.658)
$n = 1000$	$vec {(β)}_{1}$	( 0.415 0.771)	( 0.410 0.768)
	$vec {(β)}_{2}$	(-0.010 0.364)	(-0.020 0.350)
$n = 5000$	$vec {(β)}_{1}$	( 0.509 0.684)	( 0.509 0.684)
	$vec {(β)}_{2}$	(-0.031 0.143)	(-0.030 0.143)

Table 2. Table 2: Comparison of the 95% percentile interval and a 95% confidence interval for the first two components of vec ( β ) vec 𝛽 \text{vec}(\beta) . The number of bootstrap samples is B = 4 n 𝐵 4 𝑛 B=4n for each dataset.

	component	bootstrap	confidence
$n = 100$	$vec {(β)}_{1}$	(-0.013 1.617)	(0.205 1.391)
	$vec {(β)}_{2}$	( 0.232 1.438)	(0.296 1.366)
$n = 500$	$vec {(β)}_{1}$	( 0.638 1.208)	(0.646 1.198)
	$vec {(β)}_{2}$	( 0.329 0.912)	(0.369 0.868)
$n = 1000$	$vec {(β)}_{1}$	( 0.937 1.323)	(0.952 1.304)
	$vec {(β)}_{2}$	( 0.646 0.987)	(0.659 0.982)
$n = 5000$	$vec {(β)}_{1}$	( 0.995 1.161)	(0.997 1.160)
	$vec {(β)}_{2}$	( 0.608 0.771)	(0.616 0.764)

Table 3. Table 3: Comparison of the 95% percentile interval and a 95% confidence interval for the first five components of vec ( β ) vec 𝛽 \text{vec}(\beta) .

component	bootstrap	confidence
$vec {(β)}_{1}$	( 2.734 7.027)	( 2.286 7.136)
$vec {(β)}_{2}$	( -3.693 0.630)	( -3.806 0.916)
$vec {(β)}_{3}$	( -6.823 -4.173)	( -6.900 -3.812)
$vec {(β)}_{4}$	( 0.326 5.745)	( 0.181 4.939)
$vec {(β)}_{5}$	(-134.667 -52.921)	(-134.408 -56.787)

Equations91

Y_{i} = β X_{i} + ε_{i}, (i = 1, ..., n),

Y_{i} = β X_{i} + ε_{i}, (i = 1, ..., n),

Y^{*} = X \hat{β}^{T} + ε^{*},

Y^{*} = X \hat{β}^{T} + ε^{*},

var^{*} {vec (\hat{β})} = (B - 1)^{- 1} b = 1 \sum B {vec (\hat{β}_{b}^{*}) - vec (\overset{ˉ}{β}^{*})} {vec (\hat{β}_{b}^{*}) - vec (\overset{ˉ}{β}^{*})}^{T}

var^{*} {vec (\hat{β})} = (B - 1)^{- 1} b = 1 \sum B {vec (\hat{β}_{b}^{*}) - vec (\overset{ˉ}{β}^{*})} {vec (\hat{β}_{b}^{*}) - vec (\overset{ˉ}{β}^{*})}^{T}

Σ = n^{- 1} i = 1 \sum n ε_{i} ε_{i}^{T} - \overset{μ}{^}^{2}, \overset{μ}{^}^{2} = (n^{- 1} i = 1 \sum n ε_{i}) (n^{- 1} i = 1 \sum n ε_{i})^{T} .

Σ = n^{- 1} i = 1 \sum n ε_{i} ε_{i}^{T} - \overset{μ}{^}^{2}, \overset{μ}{^}^{2} = (n^{- 1} i = 1 \sum n ε_{i}) (n^{- 1} i = 1 \sum n ε_{i})^{T} .

Σ^{*} = n^{- 1} i = 1 \sum n ε_{i}^{*} ε_{i}^{*^{T}} - \overset{μ}{^}^{*^{2}}, \overset{μ}{^}^{*^{2}} = (n^{- 1} i = 1 \sum n ε_{i}^{*}) (n^{- 1} i = 1 \sum n ε_{i}^{*})^{T} .

Σ^{*} = n^{- 1} i = 1 \sum n ε_{i}^{*} ε_{i}^{*^{T}} - \overset{μ}{^}^{*^{2}}, \overset{μ}{^}^{*^{2}} = (n^{- 1} i = 1 \sum n ε_{i}^{*}) (n^{- 1} i = 1 \sum n ε_{i}^{*})^{T} .

\sqrt n [f {vec (\hat{β}^{*})} - f {vec (\hat{β})}] = \nabla f {vec (\hat{β})} \sqrt n {vec (\hat{β}^{*}) - vec (\hat{β})} + O_{p} (n^{- 1/2}) .

\sqrt n [f {vec (\hat{β}^{*})} - f {vec (\hat{β})}] = \nabla f {vec (\hat{β})} \sqrt n {vec (\hat{β}^{*}) - vec (\hat{β})} + O_{p} (n^{- 1/2}) .

\nabla f {vec (β)} (Σ_{X}^{- 1} \otimes Σ) \nabla^{T} f {vec (β)}

\nabla f {vec (β)} (Σ_{X}^{- 1} \otimes Σ) \nabla^{T} f {vec (β)}

var^{*} {vec (\hat{β})} = (B - 1)^{- 1} b = 1 \sum B {vec (\hat{β}_{b}^{*}) - vec (\overset{ˉ}{β}^{*})} {vec (\hat{β}_{b}^{*}) - vec (\overset{ˉ}{β}^{*})}^{T}

var^{*} {vec (\hat{β})} = (B - 1)^{- 1} b = 1 \sum B {vec (\hat{β}_{b}^{*}) - vec (\overset{ˉ}{β}^{*})} {vec (\hat{β}_{b}^{*}) - vec (\overset{ˉ}{β}^{*})}^{T}

n vec (\hat{β} - β) = vec {n^{- 1/2} ε^{T} X (n^{- 1} X^{T} X)^{- 1}} = {(n^{- 1} X^{T} X)^{- 1} \otimes I_{r}} vec (n^{- 1/2} ε^{T} X) \to N (0, Δ) .

n vec (\hat{β} - β) = vec {n^{- 1/2} ε^{T} X (n^{- 1} X^{T} X)^{- 1}} = {(n^{- 1} X^{T} X)^{- 1} \otimes I_{r}} vec (n^{- 1/2} ε^{T} X) \to N (0, Δ) .

\left(\begin{array}[]{c}X_{i}\\ \varepsilon_{i}\end{array}\right)\sim N\left\{\left(\begin{array}[]{c}0\\ 0\end{array}\right),\left(\begin{array}[]{cc}\Sigma_{X}&\Sigma_{X\varepsilon}\\ \Sigma_{\varepsilon X}&\Sigma\end{array}\right)\right\},

\left(\begin{array}[]{c}X_{i}\\ \varepsilon_{i}\end{array}\right)\sim N\left\{\left(\begin{array}[]{c}0\\ 0\end{array}\right),\left(\begin{array}[]{cc}\Sigma_{X}&\Sigma_{X\varepsilon}\\ \Sigma_{\varepsilon X}&\Sigma\end{array}\right)\right\},

d_{l}^{p} (μ, ν) = U \sim μ, V \sim ν in f E^{1/ l} (∥ U - V ∥^{l}) .

d_{l}^{p} (μ, ν) = U \sim μ, V \sim ν in f E^{1/ l} (∥ U - V ∥^{l}) .

[d_{2}^{r p} {Ψ_{n} (F), Ψ_{n} (G)}]^{2} = (d_{2}^{r p} [vec {ε_{n}^{T} (F) A}, vec {ε_{n}^{T} (G) A}])^{2}

[d_{2}^{r p} {Ψ_{n} (F), Ψ_{n} (G)}]^{2} = (d_{2}^{r p} [vec {ε_{n}^{T} (F) A}, vec {ε_{n}^{T} (G) A}])^{2}

= (d_{2}^{r p} [(A^{T} \otimes I_{r}) vec {ε_{n}^{T} (F)}, (A^{T} \otimes I_{r}) vec {ε_{n}^{T} (G)}])^{2}

\leq n tr {(A^{T} \otimes I_{r}) (A^{T} \otimes I_{r})^{T}} {d_{2}^{r} (F, G)}^{2} = n tr {(A^{T} \otimes I_{r}) (A \otimes I_{r})} {d_{2}^{r} (F, G)}^{2}

= n tr (A^{T} A \otimes I_{r}) {d_{2}^{r} (F, G)}^{2} = n tr {(X^{T} X)^{- 1} \otimes I_{r}} {d_{2}^{r} (F, G)}^{2}

= n r tr {(X^{T} X)^{- 1}} {d_{2}^{r} (F, G)}^{2},

{d_{2}^{r} (F_{n}, F_{n})}^{2}

{d_{2}^{r} (F_{n}, F_{n})}^{2}

= n^{- 1} tr (ε^{T} P ε) .

E {tr (ε^{T} P ε)} = tr {E (ε^{T} P ε)} = tr {P E (ε ε^{T})} \leq tr (P) tr (Σ) = p tr (Σ)

E {tr (ε^{T} P ε)} = tr {E (ε^{T} P ε)} = tr {P E (ε ε^{T})} \leq tr (P) tr (Σ) = p tr (Σ)

d_{2}^{r} (F_{n}, F_{n})^{2}

d_{2}^{r} (F_{n}, F_{n})^{2}

= d_{2}^{r} (F_{n}, F_{n})^{2} - ∥ E (F_{n}) - E (F_{n}) ∥^{2} + ∥ E (F_{n}) ∥^{2}

\leq d_{2}^{r} (F_{n}, F_{n})^{2} + ∥ n^{- 1} i = 1 \sum n ε_{i} ∥^{2}

E (∥ n^{- 1} i = 1 \sum n ε_{i} ∥^{2}) = n^{- 2} ⎩ ⎨ ⎧ E i = 1 \sum n ε_{i}^{T} ε_{i} + i \neq = j \sum ε_{i}^{T} ε_{j} ⎭ ⎬ ⎫ = n^{- 1} {E (ε_{1}^{T} ε_{1})} = n^{- 1} tr (Σ) .

E (∥ n^{- 1} i = 1 \sum n ε_{i} ∥^{2}) = n^{- 2} ⎩ ⎨ ⎧ E i = 1 \sum n ε_{i}^{T} ε_{i} + i \neq = j \sum ε_{i}^{T} ε_{j} ⎭ ⎬ ⎫ = n^{- 1} {E (ε_{1}^{T} ε_{1})} = n^{- 1} tr (Σ) .

E [d_{2}^{r p} {Ψ_{n} (F_{n}), Ψ_{n} (F)}] \leq n r tr {(X^{T} X)^{- 1}} d_{2}^{r} (F_{n}, F)

E [d_{2}^{r p} {Ψ_{n} (F_{n}), Ψ_{n} (F)}] \leq n r tr {(X^{T} X)^{- 1}} d_{2}^{r} (F_{n}, F)

\frac{1}{2} d_{2}^{r} (F_{n}, F)^{2} \leq d_{2}^{r} (F_{n}, F_{n})^{2} + d_{2}^{r} (F_{n}, F)^{2}

\frac{1}{2} d_{2}^{r} (F_{n}, F)^{2} \leq d_{2}^{r} (F_{n}, F_{n})^{2} + d_{2}^{r} (F_{n}, F)^{2}

n^{- 1} tr {(ε - ε)^{T} (ε - ε)}

n^{- 1} tr {(ε - ε)^{T} (ε - ε)}

= tr {(n^{- 1} ε^{T} X) (n^{- 1} X^{T} X)^{- 1} (n^{- 1} X^{T} ε)} .

d_{2}^{r} (F_{n}, F_{n})

d_{2}^{r} (F_{n}, F_{n})

= ∥ n^{- 1} i = 1 \sum n ε_{i} ∥^{2} - ∥ n^{- 1} i = 1 \sum n (ε_{i} - ε_{i}) ∥^{2} + d_{2}^{r} (F_{n}, F_{n})

\leq ∥ n^{- 1} i = 1 \sum n ε_{i} ∥^{2} + n^{- 1} tr {(ε - ε)^{T} (ε - ε)}

\frac{1}{2} d_{2}^{r} (F_{n}, F)^{2} \leq d_{2}^{r} (F_{n}, F_{n})^{2} + d_{2}^{r} (F_{n}, F)^{2} .

\frac{1}{2} d_{2}^{r} (F_{n}, F)^{2} \leq d_{2}^{r} (F_{n}, F_{n})^{2} + d_{2}^{r} (F_{n}, F)^{2} .

\overset{u}{ˉ} = n^{- 1} i = 1 \sum n u_{i}, and s_{u}^{2} = n^{- 1} i = 1 \sum n (u_{i} - \overset{u}{ˉ}) (u_{i} - \overset{u}{ˉ})^{T}

\overset{u}{ˉ} = n^{- 1} i = 1 \sum n u_{i}, and s_{u}^{2} = n^{- 1} i = 1 \sum n (u_{i} - \overset{u}{ˉ}) (u_{i} - \overset{u}{ˉ})^{T}

∥ s_{u}^{2} - s_{v}^{2} ∥_{F}^{2} \leq ∥ n^{- 1} i = 1 \sum n (u_{i} - v_{i}) (u_{i} - v_{i})^{T} ∥_{F}^{2}

∥ s_{u}^{2} - s_{v}^{2} ∥_{F}^{2} \leq ∥ n^{- 1} i = 1 \sum n (u_{i} - v_{i}) (u_{i} - v_{i})^{T} ∥_{F}^{2}

∥ s_{u}^{2} - s_{v}^{2} ∥_{F}^{2}

∥ s_{u}^{2} - s_{v}^{2} ∥_{F}^{2}

\leq j = 1 \sum n k = 1 \sum n ∣ n^{- 1} i = 1 \sum n (u_{i} - v_{i})_{j} (u_{i} - v_{i})_{k} ∣^{2}

= ∥ n^{- 1} i = 1 \sum n (u_{i} - v_{i}) (u_{i} - v_{i})^{T} ∥_{F}^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Bootstrapping for multivariate linear regression models

Daniel J. Eck

Department of Biostatistics, Yale School of Public Health.

[email protected]

Abstract

The multivariate linear regression model is an important tool for investigating relationships between several response variables and several predictor variables. The primary interest is in inference about the unknown regression coefficient matrix. We propose multivariate bootstrap techniques as a means for making inferences about the unknown regression coefficient matrix. These bootstrapping techniques are extensions of those developed in Freedman (1981), which are only appropriate for univariate responses. Extensions to the multivariate linear regression model are made without proof. We formalize this extension and prove its validity. A real data example and two simulated data examples which offer some finite sample verification of our theoretical results are provided.

Key Words: Multivariate Bootstrap; Multivariate Linear Regression Model; Residual Bootstrap

1 Introduction

The linear regression model is an important and useful tool in many statistical analyses for studying the relationship among variables. Regression analysis is primarily used for predicting values of the response variable at interesting values of the predictor variables, discovering the predictors that are associated with the response variable, and estimating how changes in the predictor variables affects the response variable (Weisberg, 2005). The standard linear regression methodology assumes that the response variable is a scalar. However, it may be the case that one is interested in investigating multiple response variables simultaneously. One could perform a regression analysis on each response separately in this setting. Such an analysis would fail to detect associations between responses. Regression settings where associations of multiple responses is of interest require a multivariate linear regression model for analysis.

Bootstrapping techniques are well understood for the linear regression model with a univariate response (Bickel and Freedman, 1981; Freedman, 1981). In particular, theoretical justification for the residual bootstrap as a way to estimate the variability of the ordinary least squares (OLS) estimator of the regression coefficient vector in this model has been developed (Freedman, 1981). Theoretical extensions of residual bootstrap techniques appropriate for the multivariate linear regression model have not been formally introduced. The existence of such an extension is stated without proof and rather implicitly in subsequent works (Freedman and Peters, 1984; Diaconis and Efron, 1983). In this article we show that the bootstrap procedures in Freedman (1981) provide consistent estimates of the variability of the OLS estimator of the regression coefficient matrix in the multivariate linear regression model. Our proof technique follows similar logic as Freedman (1981). The generality of the bootstrap theory developed in Bickel and Freedman (1981) provide the tools required for our extension to the multivariate linear regression model.

2 Bootstrap for the multivariate linear regression model

The multivariate linear regression is

[TABLE]

where $Y_{i}\in\mathbb{R}^{r}$ and $r>1$ in order to have an interesting problem, $\beta\in\mathbb{R}^{r\times p}$ , $X_{i}\in\mathbb{R}^{p}$ , and the $\varepsilon_{i}^{\prime}s\in\mathbb{R}^{r}$ are errors having mean zero and variance-covariance matrix $\Sigma$ where $\Sigma>0$ . It is assumed that separate realizations from the model (1) are independent and that $n>p$ . We further define $\mathbb{X}\in\mathbb{R}^{n\times p}$ as the design matrix with rows $X_{i}^{T}$ , $\mathbb{Y}\in\mathbb{R}^{n\times r}$ is the matrix of responses with rows $Y_{i}^{T}$ , and $\varepsilon\in\mathbb{R}^{n\times r}$ is the matrix of all errors with rows $\varepsilon_{i}^{T}$ . The OLS estimator of $\beta$ in model (1) is $\hat{\beta}=\mathbb{Y}^{T}\mathbb{X}(\mathbb{X}^{T}\mathbb{X})^{-1}$ . We let $\widehat{\varepsilon}\in\mathbb{R}^{n\times r}$ denote the matrix of residuals consisting of rows $\widehat{\varepsilon}_{i}^{T}=(Y_{i}-\hat{\beta}X_{i})^{T}$ . The multivariate linear regression model assumed here is slightly different than the traditional multivariate linear regression model. The traditional model makes the additional assumptions that the errors are normally distributed and the design matrix $\mathbb{X}$ is fixed.

We consider two bootstrap procedures that consistently estimate the asymptotic variability of $\text{vec}(\hat{\beta})$ under different assumptions placed upon the model (1), where the vec operator stacks the columns of a matrix so that $\text{vec}(\hat{\beta})\in\mathbb{R}^{rp\times 1}$ . The first bootstrap procedure is appropriate when the design matrix $\mathbb{X}$ is assumed to be fixed and the errors are constant. In this setup, residuals are resampled. The second bootstrap procedure is appropriate when $(X_{i}^{T},\varepsilon_{i}^{T})^{T}$ are realizations from a joint distribution. In this setup, cases $(X_{i}^{T},Y_{i}^{T})^{T}$ are resampled. It is known that bootstrapping under these setups provides a consistent estimator of the variability of $\operatorname{var}(\hat{\beta})$ in model (1) when $r=1$ (Freedman, 1981). We now provide the needed extensions.

2.1 Fixed design

We first establish the residual bootstrap of Freedman (1981) when $\mathbb{X}$ is assumed to be a fixed design matrix. Resampled, starred, data is generated by the model

[TABLE]

where $\varepsilon^{\textstyle{*}}\in\mathbb{R}^{n\times r}$ is the matrix of errors with rows being independent. The rows in $\varepsilon^{\textstyle{*}}$ have common distribution $\widehat{F}_{n}$ which is the empirical distribution of the residuals from the original dataset, centered at their mean. Now $\hat{\beta}^{\textstyle{*}}=\mathbb{Y}^{\textstyle{*}^{T}}\mathbb{X}(\mathbb{X}^{T}\mathbb{X})^{-1}$ is the OLS estimator of $\beta$ from the starred data. This process is performed a total of $B$ times with a new estimator $\hat{\beta}^{\textstyle{*}}$ computed from (2) at each iteration. We then estimate the variability of $\text{vec}(\hat{\beta})$ with

[TABLE]

where $\hat{\beta}^{\textstyle{*}}_{b}$ is the residual bootstrap estimator of $\beta$ at iteration $b$ and $\bar{\beta}^{\textstyle{*}}=B^{-1}\sum_{b=1}^{B}\hat{\beta}^{\textstyle{*}}_{b}$ . We summarize this bootstrap procedure in Algorithm 1.

Algorithm 1. Bootstrap procedure with fixed design matrix.

Step 1.

Set $B$ and initialize $b=1$ .

Step 2.

Sample residuals from $\widehat{F}_{n}$ , with replacement, and compute $\mathbb{Y}^{\textstyle{*}}$ as in (2).

Step 3.

Compute $\hat{\beta}^{\textstyle{*}}_{b}=\mathbb{Y}^{\textstyle{*}^{T}}\mathbb{X}(\mathbb{X}^{T}\mathbb{X})^{-1}$ , store $\text{vec}(\hat{\beta}^{\textstyle{*}}_{b})$ , and let $b=b+1$ .

Step 4.

Repeat Steps 2-3, iterating $b$ before returning to Step 2.

Step 5.

When $b=B$ , compute $\operatorname{var}^{\textstyle{*}}\left\{\text{vec}(\hat{\beta})\right\}$ .

Before the theoretical justification of the residual bootstrap is formally given, some important quantities are stated. The residuals from the regression (2) are $\widehat{\varepsilon}^{\textstyle{*}}=\mathbb{Y}^{\textstyle{*}}-\mathbb{X}\hat{\beta}^{\textstyle{*}^{T}}$ . The variance-covariance matrix $\Sigma$ in model (1) is then estimated by

[TABLE]

Likewise, the variance-covariance estimate from the starred data is

[TABLE]

Let $I_{k}$ denote the $k\times k$ identity matrix. Theorem 1 provides bootstrap asymptotics for the regression model (1). It extends Theorem 2.2 of Freedman (1981) to the multivariate setting.

Theorem 1.

Assume the regression model (1) where the errors have finite fourth moments. Suppose that $n^{-1}\mathbb{X}^{T}\mathbb{X}\to\Sigma_{X}>0$ . Then, conditional on almost all sample paths $Y_{1},...,Y_{n}$ , as $n\to\infty$ ,

a)

$\surd{n}\left\{\text{vec}(\hat{\beta}^{\textstyle{*}})-\text{vec}(\hat{\beta})\right\}\to^{d}N(0,\Sigma_{X}^{-1}\otimes\Sigma)$ ,

b)

$\widehat{\Sigma}^{\textstyle{*}}\to_{p}\Sigma$ , and

c)

$\left\{(\mathbb{X}^{T}\mathbb{X})^{1/2}\otimes\widehat{\Sigma}^{\textstyle{*}^{-1/2}}\right\}\left\{\text{vec}(\hat{\beta}^{\textstyle{*}})-\text{vec}(\hat{\beta})\right\}\to^{d}N(0,I_{rp})$ **

The proof of Theorem 1, along with the details of several necessary lemmas and theorems, are included in the theoretical details section. Theorem 1 establishes the multivariate analogue for the residual bootstrap. This theorem shows that standard error estimation of the estimated $\beta$ matrix obtained through bootstrapping, is $\surd{n}$ -consistent. Now let $f:\mathbb{R}^{rp}\to\mathbb{R}^{k}$ be a differentiable function. Then the conclusions of Theorem 1 can be applied to establish a multivariate delta method based on estimates obtained via the residual bootstrap. This immediately follows from a first order Taylor expansion and some algebra arriving at

[TABLE]

Therefore (3) converges weakly to a normal distribution with mean zero and variance given by

[TABLE]

as $n\to\infty$ .

2.2 Random design and heteroskedasticity

In this section we assume that the $X_{i}$ s in model (1) are realizations of a random variable $X$ . The regression coefficient matrix $\beta$ now takes the form $\beta=E(YX^{T})\Sigma_{X}^{-1}$ where $\Sigma_{X}=E(XX^{T})$ and it is assumed that $\Sigma_{X}>0$ . Now that $X$ is stochastic, there may be some association between $X$ and the errors $\varepsilon$ . The possibility of heteroskedasticity means that we need to alter the bootstrap procedure outlined in the previous section in order to consistently estimate the variability of $\text{vec}(\hat{\beta})$ .

It is assumed that the data vectors $(X_{i}^{T},Y_{i}^{T})^{T}\in\mathbb{R}^{p+r}$ are independent, with a common distribution $\mu$ and $E(\|(X_{i}^{T},Y_{i}^{T})^{T}\|^{4})<\infty$ where $\|\cdot\|$ is the Euclidean norm. Unlike the fixed design setting, data pairs $(X_{i}^{T},Y_{i}^{T})^{T}$ are resampled with replacement to form the starred data $(X^{\textstyle{*}^{T}}_{i},Y^{{\textstyle{*}^{T}}}_{i})^{T}$ , for $i=1,...,n$ . Given the original sample, $(X_{i}^{T},Y_{i}^{T})^{T}$ , $i=1,...,n$ , the resampled vectors are independent, with distribution $\mu_{n}$ . Denote $\mathbb{X}^{\textstyle{*}}\in\mathbb{R}^{n\times p}$ and $\mathbb{Y}^{\textstyle{*}}\in\mathbb{R}^{n\times r}$ as the matrix with rows $X^{\textstyle{*}^{T}}_{i}$ and $Y^{{\textstyle{*}^{T}}}_{i}$ respectively. The starred estimator of $\beta$ obtained from resampling is then $\hat{\beta}^{\textstyle{*}}=\mathbb{Y}^{\textstyle{*}^{T}}\mathbb{X}^{\textstyle{*}}\left(\mathbb{X}^{\textstyle{*}^{T}}\mathbb{X}^{\textstyle{*}}\right)^{-1}.$ For every $n$ there is positive probability, albeit low, that $\mathbb{X}^{\textstyle{*}^{T}}\mathbb{X}^{\textstyle{*}}$ is singular, and the probability of singularity decreases exponentially in $n$ . We assume that displayed equation (1.17) in Chatterjee and Bose (2000) holds in order to circumvent singularity in our bootstrap procedure.

The bootstrap is performed a total of $B$ times with a new estimator $\hat{\beta}^{\textstyle{*}}$ computed at each iteration. We then estimate the variability of $\text{vec}(\hat{\beta})$ with

[TABLE]

where $\hat{\beta}^{\textstyle{*}}_{b}$ is the bootstrap estimator of $\beta$ at iteration $b$ and $\bar{\beta}^{\textstyle{*}}=B^{-1}\sum_{b=1}^{B}\hat{\beta}^{\textstyle{*}}_{b}$ . We summarize this bootstrap procedure in Algorithm 2.

Algorithm 2. Bootstrap procedure with random design matrix.

Step 1.

Set $B$ and initialize $b=1$ .

Step 2.

Resample $(X_{i}^{T},Y_{i}^{T})^{T}$ with replacement.

Step 3.

Compute $\hat{\beta}^{\textstyle{*}}_{b}=\mathbb{Y}^{\textstyle{*}^{T}}\mathbb{X}^{\textstyle{*}}(\mathbb{X}^{\textstyle{*}^{T}}\mathbb{X}^{\textstyle{*}})^{-1}$ , store $\text{vec}(\hat{\beta}^{\textstyle{*}}_{b})$ .

Step 4.

Repeat Steps 2-3, iterating $b$ before returning to Step 2.

Step 5.

When $b=B$ , compute $\operatorname{var}^{\textstyle{*}}\left\{\text{vec}(\hat{\beta})\right\}$ .

We now show that the variability of $\text{vec}(\hat{\beta})$ is estimated consistently by our multivariate bootstrap procedure which resamples cases. Let $M$ be a non-negative definite matrix with entries $M_{jk}=E\left\{\text{vec}(X_{i}\varepsilon_{i}^{T})_{j}\text{vec}(X_{i}\varepsilon_{i}^{T})_{k}\right\}$ for $j,k=1,...,rp$ and define $\Delta=\left(\Sigma_{X}^{-1}\otimes I_{r}\right)M\left(\Sigma_{X}^{-1}\otimes I_{r}\right).$ where $n^{-1}\mathbb{X}^{T}\mathbb{X}\to\Sigma_{X}$ a.e. as $n\to\infty$ . Then

[TABLE]

The next theorem states that $\sqrt{n}\text{vec}\left(\hat{\beta}^{\textstyle{*}}-\hat{\beta}\right)$ is the same as (4). This is an extension of Theorems 3.1 and 3.2 of Freedman (1981) to the multivariate linear regression setting.

Theorem 2.

Assume that $(X_{i}^{T},Y_{i}^{T})^{T}\in\mathbb{R}^{p+r}$ are independent, with a common distribution $\mu$ , $E(\|(X_{i}^{T},Y_{i}^{T})^{T}\|^{4})<\infty$ , and $\Sigma_{X}=E(XX^{T})$ is positive definite. Then, conditional on almost all sample paths, $(X_{i}^{T},Y_{i}^{T})^{T}$ , $i=1,...,n$ , as $n\to\infty$ ,

a)

$n^{-1}\left(\mathbb{X}^{\textstyle{*}^{T}}\mathbb{X}^{\textstyle{*}}\right)\to_{p}\Sigma_{X}$ ,

b)

$\sqrt{n}\left\{\text{vec}(\hat{\beta}^{\textstyle{*}})-\text{vec}(\hat{\beta})\right\}\to^{d}N(0,\Delta)$ , and

c)

the sequence $\widehat{\Sigma}^{\textstyle{*}}\to_{p}\Sigma$ .

The proof of Theorem 2, along with necessary lemmas, are included in the theoretical details section.

3 Examples

3.1 Simulations

In this section we provide two simulated examples which show support for our multivariate bootstrap procedures.

3.1.1 Fixed design

This example illustrates Theorem 1. We generated data according to the multivariate linear regression model (1) where $Y_{i}\in\mathbb{R}^{3}$ , $X_{i}\in\mathbb{R}^{2}$ , and both $\beta$ and $\Sigma$ are prespecified. Our goal is to make inference about $\text{vec}(\beta)$ using confidence regions. For each component of $\beta$ , a 95% percentile interval computed using the residual bootstrap in Algorithm 1 is compared with a 95% confidence interval that assumes model (1) is correct. Four data sets were generated at different sample sizes and the performance of the multivariate residual bootstrap is assessed. The bootstrap is performed $B=4n$ times in each dataset. The results are displayed in Table 1. For the first two components of $\beta$ , we see that the confidence regions obtained from both methods are close to each other and that the distance between the two shrinks as $n$ increases. Similar results are obtained for the other components of $\beta$ .

3.1.2 Random design and heteroskedasticity

This example aims to show support for Theorem 2. We generated data according to the multivariate linear regression model (1) where $Y_{i}\in\mathbb{R}^{3}$ , $X_{i}\in\mathbb{R}^{2}$ , and both $\beta$ and $\Sigma$ are prespecified. The predictors and errors are generated according to

[TABLE]

for $i=1,...,n$ . Our goal is to make inference about $\text{vec}(\beta)$ using the multivariate bootstrap procedure in the random design case. For each component of $\beta$ , a 95% percentile interval computed using the residual bootstrap in Algorithm 2 is compared with a 95% confidence interval that assumes model (1) with heterogeneity is correct. Three data sets were generated at different sample sizes and the performance of the multivariate bootstrap is assessed. The bootstrap is performed a total of $B=4n$ times in each dataset. The results are displayed in Table 2. For the first two components of $\beta$ , we see that the confidence regions obtained from both methods are close to each other and that the distance between the two shrinks as $n$ increases. Similar results are obtained for the other components of $\beta$ .

3.2 Cars data

The data in this example, analyzed in Henderson and Velleman (1981), was extracted from the 1974 Motor Trend US magazine. The objective of this study is to compare aspects of automobile design on performance and fuel composition for 32 automobiles (1973-74) models. In this analysis, we assume that the multivariate model (1) with miles per gallon, displacement, and horse power as response variables and number of cylinders and transmission type are predictors. Number of cylinders and transmission type are both factor variables. The automobiles have either 4, 6, or 8 cylinders and their transmission type is either automatic or manual.

For inference for $\beta$ , we compare a 95% bootstrap percentile region using the fixed design bootstrap in Algorithm 1 with a 95% confidence interval. The number of bootstrap resamples is set at $B=4n$ . The results are depicted in Table 3. We see that inferences about $\beta$ are fairly similar for both methods.

4 Theoretical details

Before we present our proof of Theorems 1 and 2, we motivate the Mallows metric as a central tool for our proof technique. The Mallows metric for probabilities in $\mathbb{R}^{p}$ , relative to the Euclidean norm was the driving force needed to establish the validity of the residual bootstrap approximation in the context of univariate regression (Bickel and Freedman, 1981; Freedman, 1981). The Mallows metric, relative to the Euclidean norm, for two probability measures $\mu,\nu$ in $\mathbb{R}^{p}$ , denoted $d_{l}^{p}(\mu,\nu)$ , is

[TABLE]

Properties of the Mallows metric are developed for random variables on separable Banach spaces of finite dimension (Bickel and Freedman, 1981). Since $\mathbb{R}^{k}$ is indeed a separable Banach space for a natural number $k$ , the theory in Bickel and Freedman (1981) applies to our case. In the present article, we use the Mallows metric when $r>1$ to prove that the residual bootstrap can be used to estimate the variability of $\text{vec}(\hat{\beta})$ consistently.

4.1 Fixed design

Let $\Psi_{n}(F)$ be the distribution function of $\surd{n}\left\{\text{vec}(\hat{\beta})-\text{vec}(\beta)\right\}$ where $F$ is the law of the errors $\varepsilon$ so that $\Psi_{n}(F)$ is a probability measure on $\mathbb{R}^{rp}$ . Let $G$ be an alternate law of the errors, where it is assumed that $G$ is mean-zero with finite variance $\Sigma_{G}>0$ . In applications, $G$ will be the centered empirical distribution of the residuals.

Theorem 3.

$\left[d_{2}^{rp}\left\{\Psi_{n}(F),\Psi_{n}(G)\right\}\right]^{2}\leq nr\operatorname{tr}\left\{(\mathbb{X}^{T}\mathbb{X})^{-1}\right\}\left\{d_{2}^{r}(F,G)\right\}^{2}$ .

Proof.

Let $A=\mathbb{X}(\mathbb{X}^{T}\mathbb{X})^{-1}$ . Then $\Psi_{n}(F)$ is the law of $\surd{n}\varepsilon_{n}^{T}(F)A$ where $\varepsilon_{n}(F)$ is the matrix with $n$ rows of independent random variables $\varepsilon$ , having common law $F$ . $\Psi_{n}(G)$ can be thought of similarly. Observe that $A^{T}A=(\mathbb{X}^{T}\mathbb{X})^{-1}$ . Then, from Lemma 8.9 in Bickel and Freedman (1981), we see that

[TABLE]

which is our desired conclusion. ∎

With Theorem 3 we can bound the distance between the sample dependent distribution functions $\Psi_{n}(F)$ and $\Psi_{n}(G)$ by the distance between their underlying laws. As in Freedman (1981), we proceed with $F_{n}$ as the empirical distribution function of $\varepsilon_{1},...,\varepsilon_{n}$ . Let $\widetilde{F}_{n}$ be the empirical distribution of the residuals $\widehat{\varepsilon}_{1},...,\widehat{\varepsilon}_{n}$ from the original regression, and let $\widehat{F}_{n}$ be $\widetilde{F}_{n}$ centered at its mean $\hat{\mu}=n^{-1}\sum_{i=1}^{n}\widehat{\varepsilon}_{i}$ . Since $\widehat{\varepsilon}=\mathbb{Y}-\mathbb{X}\hat{\beta}^{T}$ , we have $\widehat{\varepsilon}-\varepsilon=-\mathcal{P}\varepsilon$ where $\mathcal{P}$ is the projection into the column space of $\mathbb{X}$ .

Lemma 1.

$E^{2}\left\{d_{2}^{r}(\widetilde{F}_{n},F_{n})\right\}\leq p\operatorname{tr}(\Sigma)/n$ .

Proof.

From the definition of the Mallows metric we have

[TABLE]

From linearity of the expectation with respect to the trace operator,

[TABLE]

and this completes the proof. ∎

Lemma 2.

$E^{2}\left\{d_{2}^{r}(\widehat{F}_{n},F_{n})\right\}\leq(p+1)\operatorname{tr}(\Sigma)/n$ .

Proof.

From Lemma 8.8 in Bickel and Freedman (1981) we have

[TABLE]

with the empirical distribution functions $F_{n}$ , $\widetilde{F}_{n}$ , and $\widehat{F}_{n}$ used as random variables in the application of Lemma 8.8 in Bickel and Freedman (1981). We see that

[TABLE]

Our conclusion follows from Lemma 1. ∎

These results imply the validity of the bootstrap approximation for the model (1) if we assume that $n^{-1}\mathbb{X}^{T}\mathbb{X}\to\Sigma_{X}>0$ . From Theorem 3,

[TABLE]

and because of the metric properties of $d_{2}^{r}(\cdot,\cdot)$

[TABLE]

where Lemma 2 shows that $d_{2}^{r}(\widehat{F}_{n},F_{n})^{2}\to_{p}0$ and Lemma 8.4 of (Bickel and Freedman, 1981) implies that $d_{2}^{r}(F_{n},F)^{2}\to_{p}0$ with the separable Banach space taken to be $\mathbb{R}^{r}$ . The next results are special cases of Lai et al. (1979) which are adapted from Freedman (1981) to the multivariate setting. We let $\varepsilon_{j}$ , $j=1,...,r$ , be the column of $\varepsilon$ corresponding to the errors of response $Y_{j}$ .

Lemma 3.

$n^{-1}\mathbb{X}^{T}\varepsilon\to 0\;$ * a.s. and $\hat{\beta}\to\beta\;$ a.s.*

Proof.

Let $A_{j}$ be the $j$ th column of $\varepsilon$ . Then $n^{-1}\mathbb{X}^{T}\varepsilon\in\mathbb{R}^{p\times r}$ with columns $n^{-1}\mathbb{X}^{T}\varepsilon$ . Lemma 2.3 of Freedman (1981) states that $n^{-1}\mathbb{X}^{T}A_{j}\to 0$ a.s. for any particular $j=1,...,r$ . Therefore $n^{-1}\mathbb{X}^{T}\varepsilon\to 0$ a.s. A similar argument verifies our second result. ∎

Lemma 4.

$n^{-1}\operatorname{tr}\left\{(\widehat{\varepsilon}-\varepsilon)^{T}(\widehat{\varepsilon}-\varepsilon)\right\}\to 0\;$ * a.s..*

Proof.

A similar argument to that of Lemma 2.4 in Freedman (1981) gives

[TABLE]

The center term converges to $\Sigma_{X}>0$ and the left and right terms converge to 0 a.s. by Lemma 3. Our result follows. ∎

Lemma 5.

$d_{2}^{r}(\widehat{F}_{n},F_{n})\to 0\;$ * a.s. and $d_{2}^{r}(\widehat{F}_{n},F)\to 0\;$ a.s.*

Proof.

From the arguments in the proofs of Lemmas 1 and 2 we have that

[TABLE]

which converges to 0 a.s. by Lemma 4. Therefore the first convergence result holds. From the metric properties of the Mallows metric we have that

[TABLE]

Our second convergence result follows from the first convergence result and Lemma 8.4 of Bickel and Freedman (1981). ∎

Lemma 6.

Let $u_{i}$ and $v_{i}$ , $i=1,...,n$ , be $r\times 1$ vectors. Let

[TABLE]

and similarly for $v$ . Then

[TABLE]

where $\|\cdot\|_{F}$ is the Frobenius norm.

Proof.

We have

[TABLE]

where the inequality follows from (Freedman, 1981, Lemma 2.7). ∎

The proof of Theorem 1 is now given. Before we this Theorem, define the $\text{vech}(A)\in\mathbb{R}^{p(p+1)/2\times 1}$ operator to be the function that stacks the unique $p(p+1)/2$ elements of any symmetric matrix $A\in\mathbb{R}^{p\times p}$ .

Proof.

Exchange $\widehat{F}_{n}$ for $G$ in Theorem 3 and observe that

[TABLE]

From Lemma 5 we know that $d_{2}^{r}(F,\widehat{F}_{n})^{2}\to 0$ almost everywhere. Our result for part a) follows since $F$ is mean-zero normal with variance $\Sigma_{X}^{-1}\otimes\Sigma$ . We now show that part b) holds. First, we need to establish that $\widehat{\Sigma}\to\Sigma$ almost everywhere. To see this, introduce

[TABLE]

Clearly, $\Sigma_{n}\to\Sigma$ a.s. Let $C_{n}=n^{-1}\sum_{i=1}^{n}\left(\hat{\varepsilon}_{i}-\varepsilon_{i}\right)\left(\hat{\varepsilon}_{i}-\varepsilon_{i}\right)^{T}$ . We have,

[TABLE]

a.s. where the first inequality follows from Lemma 6 with $\widehat{\Sigma}_{n}$ and $\Sigma_{n}$ taking the place of $s_{u}^{2}$ and $s_{v}^{2}$ respectively, the second inequality follows from the fact that $C_{n}$ is positive definite a.s., and the convergence follows from Lemma 4.

Let $D_{n}=E\left(\|\widehat{\Sigma}^{\textstyle{*}}_{n}-\Sigma_{n}^{\textstyle{*}}\|_{F}\;\mid Y_{1},...,Y_{n}\right)$ . From Lemma 6 and the proof of Lemma 1 we see that,

[TABLE]

where the last inequality follows from the argument that proves Lemma 1 applied to the starred data, and $p\operatorname{tr}\left(\widehat{\Sigma}\right)/n\to 0$ a.s. It remains to show that $\widehat{\Sigma}^{\textstyle{*}}_{n}$ converges to $\Sigma$ . Conditional on $Y_{1},...,Y_{n}$ ,

[TABLE]

by Lemma 8.6 in Bickel and Freedman (1981). Now $\varepsilon^{\textstyle{*}}$ has conditional distribution $\widehat{F}_{n}$ and $\varepsilon$ has law $F$ and Lemma 5 gives $d_{2}^{r}\left(\widehat{F}_{n},F\right)\to 0$ almost everywhere. We now show that $d_{1}\left\{\text{vech}(\varepsilon_{1}^{\textstyle{*}}\varepsilon_{1}^{\textstyle{*}^{T}}),\text{vech}(\varepsilon_{1}\varepsilon_{1}^{T})\right\}\to 0$ a.s. by Lemma 8.5 of Bickel and Freedman (1981) with $\phi(x)=\text{vech}\left(xx^{T}\right)$ where $x\in\mathbb{R}^{r}$ . To do this, we show that $K$ can be chosen so that $\|\phi(x)\|_{1}\leq K(1+\|x\|_{2}^{2})$ where $\|\cdot\|_{1}$ and $\|\cdot\|_{2}$ are the $\mathcal{L}^{1}$ and $\mathcal{L}^{2}$ norms respectively. From the definition of the Euclidean norm, we have $\|x\|_{2}^{2}=\sum_{i=1}^{r}x_{i}^{2}$ . It is clear that $x_{i}^{2}+x_{j}^{2}\geq 2|x_{i}x_{j}|$ for all $i,j=1,...,r$ . Now, pick $K={r\choose 2}+1$ . We see that

[TABLE]

A similar argument shows that $1/n\sum_{i=1}^{n}\varepsilon^{\textstyle{*}}_{i}$ converges to 0. Part c) follows from both a) and b). ∎

4.2 Random design and heteroskedasticity

In this section we provide the proof of Theorem 2. Several quantities and lemmas are introduced in order to prove Theorem 2. The logic follows that of (Freedman, 1981, Section 3). Define,

[TABLE]

The next two lemmas are needed to prove Theorem 2.

Lemma 7.

If $d_{4}^{p+r}(\mu_{n},\mu)\to 0$ as $n\to\infty$ , then

a)

$\Sigma(\mu_{n})\to\Sigma(\mu)$ * and $\beta(\mu_{n})\to\beta(\mu)$ ,*

b)

the $\mu_{n}$ -law of $\text{vec}\{\varepsilon(\mu_{n},x,y)x^{T}\}$ converges to the $\mu$ -law of $\text{vec}\{\varepsilon(\mu,x,y)x^{T}\}$ in $d_{2}^{rp}$ ,

c)

the $\mu_{n}$ -law of $\|\varepsilon(\mu_{n},x,y)\|^{2}$ converges to the $\mu$ -law of $\|\varepsilon(\mu,x,y)\|^{2}$ in $d_{1}$ .

Proof.

Part a) immediately follows from (Bickel and Freedman, 1981, Lemma 8.3c).

We use (Bickel and Freedman, 1981, Lemma 8.3a) to verify part b). The weak convergence step is evident. Now,

[TABLE]

Let $z=(x^{T},y^{T})^{T}$ . Part b) follows from, integration with respect to $\mu_{n}$ , part a), and (Bickel and Freedman, 1981, Lemma 8.5) with $\phi(z)=\text{vech}(zz^{T})$ . The steps involving (Bickel and Freedman, 1981, Lemma 8.5) are similar to those in the proof of Theorem 1.

Part c) follows from the same argument used to prove part b). ∎

Lemma 8.

$d_{4}^{p+r}(\mu_{n},\mu)\to 0$ * a.e. as $n\to\infty$ .*

Proof.

The steps are the same as those in (Freedman, 1981, Lemma 3.2). ∎

The proof of Theorem 2 is now given.

Proof.

We can write

[TABLE]

where $Z^{\textstyle{*}}=n^{-1/2}\varepsilon^{{\textstyle{*}^{T}}}\mathbb{X}^{\textstyle{*}}$ and $W^{\textstyle{*}}=n^{-1}\mathbb{X}^{\textstyle{*}^{T}}\mathbb{X}^{\textstyle{*}}$ . (Freedman, 1981, Theorem 3.1) implies that the conditional law, conditional on $(X_{i},Y_{i})$ , $i=1,...,n$ , of $W^{\textstyle{*}}\to_{p}\Sigma_{X}$ . This verifies part a).

We now verify part b). From (Bickel and Freedman, 1981, Lemma 8.7), we have

[TABLE]

where the right side goes to 0 a.e. as $n\to\infty$ . Lemma 8 states that $\mu_{n}\to\mu$ a.e. in $d_{4}^{r+p}$ as $n\to\infty$ and part b) of Lemma 7 implies that the distribution of $\text{vec}(Z^{\textstyle{*}})$ , conditional on $(X_{i},Y_{i})$ , $i=1,...,n$ , converges to $\text{vec}(Z)$ . The random variable $\text{vec}(Z)$ is normally distributed with mean 0 and variance matrix $M$ . Combining this with part a) verifies that the conditional distribution of $\left(W^{{\textstyle{*}^{-1}}}\otimes I_{r}\right)\text{vec}(Z^{\textstyle{*}})$ converges to $\left(\Sigma_{X}^{-1}\otimes I_{r}\right)\text{vec}(Z)$ as $n\to\infty$ . This completes the proof of part b).

Part c) follows from the same argument in the proof of Theorem 1 where Lemmas 8 and 7c combine to show that (5) converges to 0 as $n\to\infty$ . Note that $\varepsilon_{1}^{\textstyle{*}}=Y_{1}^{\textstyle{*}}-\hat{\beta}X_{1}^{\textstyle{*}}$ in this argument. This completes the proof. ∎

5 Acknowledgments

The author would like to thank Karl Oskar Ekvall, Forrest Crawford, Snigdhansu Chatterjee, Dennis Cook, and two anonymous referees for providing valuable feedback which led to the strengthening of this article.

Bibliography8

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bickel and Freedman [1981] P. J. Bickel and D. A. Freedman. Some asymptotic theory for the bootstrap. Ann. Statist. , 9:1196–1217, 1981.
2Chatterjee and Bose [2000] S. Chatterjee and A. Bose. Variance estimation in high dimensional models. Statist. Sin. , 10:497–515, 2000.
3Diaconis and Efron [1983] P. Diaconis and B. Efron. Computer intensive methods in statistics. Sci. Am. , 248, 1983.
4Freedman [1981] D. A. Freedman. Bootstrapping regression models. Ann. Statist. , 9:1218–1228, 1981.
5Freedman and Peters [1984] D. A. Freedman and S. C. Peters. Bootstrapping a regression equation: Some empirical results. J. Am. Statist. Assoc. , 79:97–106, 1984.
6Henderson and Velleman [1981] H. V. Henderson and P. F. Velleman. Building multiple regression models interactively. Biometrics , 37:391–411, 1981.
7Lai et al. [1979] T. Lai, H. Robbins, and V. Wei. Strong consistency of least squares estimated in multiple regression. J. Mult. Anal. , 9:343–361, 1979.
8Weisberg [2005] S. Weisberg. Applied Linear Regression . Wiley, New Jersey, 2005.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Abstract

1 Introduction

2 Bootstrap for the multivariate linear regression model

2.1 Fixed design

Theorem 1**.**

2.2 Random design and heteroskedasticity

Theorem 2**.**

3 Examples

3.1 Simulations

3.1.1 Fixed design

3.1.2 Random design and heteroskedasticity

3.2 Cars data

4 Theoretical details

4.1 Fixed design

Theorem 3**.**

Proof.

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Lemma 5**.**

Proof.

Lemma 6**.**

Proof.

Proof.

4.2 Random design and heteroskedasticity

Lemma 7**.**

Proof.

Lemma 8**.**

Proof.

Proof.

5 Acknowledgments

Theorem 1.

Theorem 2.

Theorem 3.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.