Bootstrapping for multivariate linear regression models
Daniel J. Eck

TL;DR
This paper extends bootstrap methods to multivariate linear regression models, enabling inference on the regression coefficient matrix with theoretical validation and practical examples.
Contribution
It introduces multivariate bootstrap techniques for regression inference, extending Freedman's univariate methods without requiring proof, and validates them through simulations and real data.
Findings
Bootstrap methods are valid for multivariate regression coefficients.
Simulation studies support theoretical results.
Real data example demonstrates practical applicability.
Abstract
The multivariate linear regression model is an important tool for investigating relationships between several response variables and several predictor variables. The primary interest is in inference about the unknown regression coefficient matrix. We propose multivariate bootstrap techniques as a means for making inferences about the unknown regression coefficient matrix. These bootstrapping techniques are extensions of those developed in Freedman (1981), which are only appropriate for univariate responses. Extensions to the multivariate linear regression model are made without proof. We formalize this extension and prove its validity. A real data example and two simulated data examples which offer some finite sample verification of our theoretical results are provided.
| component | bootstrap | confidence | |
|---|---|---|---|
| (-0.062 0.958) | (-0.092 0.997) | ||
| (-0.873 0.330) | (-0.922 0.342) | ||
| ( 0.279 0.826) | (0.256 0.823) | ||
| ( 0.070 0.655) | (0.074 0.658) | ||
| ( 0.415 0.771) | ( 0.410 0.768) | ||
| (-0.010 0.364) | (-0.020 0.350) | ||
| ( 0.509 0.684) | ( 0.509 0.684) | ||
| (-0.031 0.143) | (-0.030 0.143) |
| component | bootstrap | confidence | |
|---|---|---|---|
| (-0.013 1.617) | (0.205 1.391) | ||
| ( 0.232 1.438) | (0.296 1.366) | ||
| ( 0.638 1.208) | (0.646 1.198) | ||
| ( 0.329 0.912) | (0.369 0.868) | ||
| ( 0.937 1.323) | (0.952 1.304) | ||
| ( 0.646 0.987) | (0.659 0.982) | ||
| ( 0.995 1.161) | (0.997 1.160) | ||
| ( 0.608 0.771) | (0.616 0.764) |
| component | bootstrap | confidence |
|---|---|---|
| ( 2.734 7.027) | ( 2.286 7.136) | |
| ( -3.693 0.630) | ( -3.806 0.916) | |
| ( -6.823 -4.173) | ( -6.900 -3.812) | |
| ( 0.326 5.745) | ( 0.181 4.939) | |
| (-134.667 -52.921) | (-134.408 -56.787) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Bootstrapping for multivariate linear regression models
Daniel J. Eck
Department of Biostatistics, Yale School of Public Health.
Abstract
The multivariate linear regression model is an important tool for investigating relationships between several response variables and several predictor variables. The primary interest is in inference about the unknown regression coefficient matrix. We propose multivariate bootstrap techniques as a means for making inferences about the unknown regression coefficient matrix. These bootstrapping techniques are extensions of those developed in Freedman (1981), which are only appropriate for univariate responses. Extensions to the multivariate linear regression model are made without proof. We formalize this extension and prove its validity. A real data example and two simulated data examples which offer some finite sample verification of our theoretical results are provided.
Key Words: Multivariate Bootstrap; Multivariate Linear Regression Model; Residual Bootstrap
1 Introduction
The linear regression model is an important and useful tool in many statistical analyses for studying the relationship among variables. Regression analysis is primarily used for predicting values of the response variable at interesting values of the predictor variables, discovering the predictors that are associated with the response variable, and estimating how changes in the predictor variables affects the response variable (Weisberg, 2005). The standard linear regression methodology assumes that the response variable is a scalar. However, it may be the case that one is interested in investigating multiple response variables simultaneously. One could perform a regression analysis on each response separately in this setting. Such an analysis would fail to detect associations between responses. Regression settings where associations of multiple responses is of interest require a multivariate linear regression model for analysis.
Bootstrapping techniques are well understood for the linear regression model with a univariate response (Bickel and Freedman, 1981; Freedman, 1981). In particular, theoretical justification for the residual bootstrap as a way to estimate the variability of the ordinary least squares (OLS) estimator of the regression coefficient vector in this model has been developed (Freedman, 1981). Theoretical extensions of residual bootstrap techniques appropriate for the multivariate linear regression model have not been formally introduced. The existence of such an extension is stated without proof and rather implicitly in subsequent works (Freedman and Peters, 1984; Diaconis and Efron, 1983). In this article we show that the bootstrap procedures in Freedman (1981) provide consistent estimates of the variability of the OLS estimator of the regression coefficient matrix in the multivariate linear regression model. Our proof technique follows similar logic as Freedman (1981). The generality of the bootstrap theory developed in Bickel and Freedman (1981) provide the tools required for our extension to the multivariate linear regression model.
2 Bootstrap for the multivariate linear regression model
The multivariate linear regression is
[TABLE]
where and in order to have an interesting problem, , , and the are errors having mean zero and variance-covariance matrix where . It is assumed that separate realizations from the model (1) are independent and that . We further define as the design matrix with rows , is the matrix of responses with rows , and is the matrix of all errors with rows . The OLS estimator of in model (1) is . We let denote the matrix of residuals consisting of rows . The multivariate linear regression model assumed here is slightly different than the traditional multivariate linear regression model. The traditional model makes the additional assumptions that the errors are normally distributed and the design matrix is fixed.
We consider two bootstrap procedures that consistently estimate the asymptotic variability of under different assumptions placed upon the model (1), where the vec operator stacks the columns of a matrix so that . The first bootstrap procedure is appropriate when the design matrix is assumed to be fixed and the errors are constant. In this setup, residuals are resampled. The second bootstrap procedure is appropriate when are realizations from a joint distribution. In this setup, cases are resampled. It is known that bootstrapping under these setups provides a consistent estimator of the variability of in model (1) when (Freedman, 1981). We now provide the needed extensions.
2.1 Fixed design
We first establish the residual bootstrap of Freedman (1981) when is assumed to be a fixed design matrix. Resampled, starred, data is generated by the model
[TABLE]
where is the matrix of errors with rows being independent. The rows in have common distribution which is the empirical distribution of the residuals from the original dataset, centered at their mean. Now is the OLS estimator of from the starred data. This process is performed a total of times with a new estimator computed from (2) at each iteration. We then estimate the variability of with
[TABLE]
where is the residual bootstrap estimator of at iteration and . We summarize this bootstrap procedure in Algorithm 1.
Algorithm 1. Bootstrap procedure with fixed design matrix.
- Step 1.
Set and initialize .
- Step 2.
Sample residuals from , with replacement, and compute as in (2).
- Step 3.
Compute , store , and let .
- Step 4.
Repeat Steps 2-3, iterating before returning to Step 2.
- Step 5.
When , compute .
Before the theoretical justification of the residual bootstrap is formally given, some important quantities are stated. The residuals from the regression (2) are . The variance-covariance matrix in model (1) is then estimated by
[TABLE]
Likewise, the variance-covariance estimate from the starred data is
[TABLE]
Let denote the identity matrix. Theorem 1 provides bootstrap asymptotics for the regression model (1). It extends Theorem 2.2 of Freedman (1981) to the multivariate setting.
Theorem 1**.**
Assume the regression model (1) where the errors have finite fourth moments. Suppose that . Then, conditional on almost all sample paths , as ,
- a)
,
- b)
, and
- c)
**
The proof of Theorem 1, along with the details of several necessary lemmas and theorems, are included in the theoretical details section. Theorem 1 establishes the multivariate analogue for the residual bootstrap. This theorem shows that standard error estimation of the estimated matrix obtained through bootstrapping, is -consistent. Now let be a differentiable function. Then the conclusions of Theorem 1 can be applied to establish a multivariate delta method based on estimates obtained via the residual bootstrap. This immediately follows from a first order Taylor expansion and some algebra arriving at
[TABLE]
Therefore (3) converges weakly to a normal distribution with mean zero and variance given by
[TABLE]
as .
2.2 Random design and heteroskedasticity
In this section we assume that the s in model (1) are realizations of a random variable . The regression coefficient matrix now takes the form where and it is assumed that . Now that is stochastic, there may be some association between and the errors . The possibility of heteroskedasticity means that we need to alter the bootstrap procedure outlined in the previous section in order to consistently estimate the variability of .
It is assumed that the data vectors are independent, with a common distribution and where is the Euclidean norm. Unlike the fixed design setting, data pairs are resampled with replacement to form the starred data , for . Given the original sample, , , the resampled vectors are independent, with distribution . Denote and as the matrix with rows and respectively. The starred estimator of obtained from resampling is then For every there is positive probability, albeit low, that is singular, and the probability of singularity decreases exponentially in . We assume that displayed equation (1.17) in Chatterjee and Bose (2000) holds in order to circumvent singularity in our bootstrap procedure.
The bootstrap is performed a total of times with a new estimator computed at each iteration. We then estimate the variability of with
[TABLE]
where is the bootstrap estimator of at iteration and . We summarize this bootstrap procedure in Algorithm 2.
Algorithm 2. Bootstrap procedure with random design matrix.
- Step 1.
Set and initialize .
- Step 2.
Resample with replacement.
- Step 3.
Compute , store .
- Step 4.
Repeat Steps 2-3, iterating before returning to Step 2.
- Step 5.
When , compute .
We now show that the variability of is estimated consistently by our multivariate bootstrap procedure which resamples cases. Let be a non-negative definite matrix with entries for and define where a.e. as . Then
[TABLE]
The next theorem states that is the same as (4). This is an extension of Theorems 3.1 and 3.2 of Freedman (1981) to the multivariate linear regression setting.
Theorem 2**.**
Assume that are independent, with a common distribution , , and is positive definite. Then, conditional on almost all sample paths, , , as ,
- a)
,
- b)
, and
- c)
the sequence .
The proof of Theorem 2, along with necessary lemmas, are included in the theoretical details section.
3 Examples
3.1 Simulations
In this section we provide two simulated examples which show support for our multivariate bootstrap procedures.
3.1.1 Fixed design
This example illustrates Theorem 1. We generated data according to the multivariate linear regression model (1) where , , and both and are prespecified. Our goal is to make inference about using confidence regions. For each component of , a 95% percentile interval computed using the residual bootstrap in Algorithm 1 is compared with a 95% confidence interval that assumes model (1) is correct. Four data sets were generated at different sample sizes and the performance of the multivariate residual bootstrap is assessed. The bootstrap is performed times in each dataset. The results are displayed in Table 1. For the first two components of , we see that the confidence regions obtained from both methods are close to each other and that the distance between the two shrinks as increases. Similar results are obtained for the other components of .
3.1.2 Random design and heteroskedasticity
This example aims to show support for Theorem 2. We generated data according to the multivariate linear regression model (1) where , , and both and are prespecified. The predictors and errors are generated according to
[TABLE]
for . Our goal is to make inference about using the multivariate bootstrap procedure in the random design case. For each component of , a 95% percentile interval computed using the residual bootstrap in Algorithm 2 is compared with a 95% confidence interval that assumes model (1) with heterogeneity is correct. Three data sets were generated at different sample sizes and the performance of the multivariate bootstrap is assessed. The bootstrap is performed a total of times in each dataset. The results are displayed in Table 2. For the first two components of , we see that the confidence regions obtained from both methods are close to each other and that the distance between the two shrinks as increases. Similar results are obtained for the other components of .
3.2 Cars data
The data in this example, analyzed in Henderson and Velleman (1981), was extracted from the 1974 Motor Trend US magazine. The objective of this study is to compare aspects of automobile design on performance and fuel composition for 32 automobiles (1973-74) models. In this analysis, we assume that the multivariate model (1) with miles per gallon, displacement, and horse power as response variables and number of cylinders and transmission type are predictors. Number of cylinders and transmission type are both factor variables. The automobiles have either 4, 6, or 8 cylinders and their transmission type is either automatic or manual.
For inference for , we compare a 95% bootstrap percentile region using the fixed design bootstrap in Algorithm 1 with a 95% confidence interval. The number of bootstrap resamples is set at . The results are depicted in Table 3. We see that inferences about are fairly similar for both methods.
4 Theoretical details
Before we present our proof of Theorems 1 and 2, we motivate the Mallows metric as a central tool for our proof technique. The Mallows metric for probabilities in , relative to the Euclidean norm was the driving force needed to establish the validity of the residual bootstrap approximation in the context of univariate regression (Bickel and Freedman, 1981; Freedman, 1981). The Mallows metric, relative to the Euclidean norm, for two probability measures in , denoted , is
[TABLE]
Properties of the Mallows metric are developed for random variables on separable Banach spaces of finite dimension (Bickel and Freedman, 1981). Since is indeed a separable Banach space for a natural number , the theory in Bickel and Freedman (1981) applies to our case. In the present article, we use the Mallows metric when to prove that the residual bootstrap can be used to estimate the variability of consistently.
4.1 Fixed design
Let be the distribution function of where is the law of the errors so that is a probability measure on . Let be an alternate law of the errors, where it is assumed that is mean-zero with finite variance . In applications, will be the centered empirical distribution of the residuals.
Theorem 3**.**
.
Proof.
Let . Then is the law of where is the matrix with rows of independent random variables , having common law . can be thought of similarly. Observe that . Then, from Lemma 8.9 in Bickel and Freedman (1981), we see that
[TABLE]
which is our desired conclusion. ∎
With Theorem 3 we can bound the distance between the sample dependent distribution functions and by the distance between their underlying laws. As in Freedman (1981), we proceed with as the empirical distribution function of . Let be the empirical distribution of the residuals from the original regression, and let be centered at its mean . Since , we have where is the projection into the column space of .
Lemma 1**.**
.
Proof.
From the definition of the Mallows metric we have
[TABLE]
From linearity of the expectation with respect to the trace operator,
[TABLE]
and this completes the proof. ∎
Lemma 2**.**
.
Proof.
From Lemma 8.8 in Bickel and Freedman (1981) we have
[TABLE]
with the empirical distribution functions ,, and used as random variables in the application of Lemma 8.8 in Bickel and Freedman (1981). We see that
[TABLE]
Our conclusion follows from Lemma 1. ∎
These results imply the validity of the bootstrap approximation for the model (1) if we assume that . From Theorem 3,
[TABLE]
and because of the metric properties of
[TABLE]
where Lemma 2 shows that and Lemma 8.4 of (Bickel and Freedman, 1981) implies that with the separable Banach space taken to be . The next results are special cases of Lai et al. (1979) which are adapted from Freedman (1981) to the multivariate setting. We let , , be the column of corresponding to the errors of response .
Lemma 3**.**
* a.s. and a.s.*
Proof.
Let be the th column of . Then with columns . Lemma 2.3 of Freedman (1981) states that a.s. for any particular . Therefore a.s. A similar argument verifies our second result. ∎
Lemma 4**.**
* a.s..*
Proof.
A similar argument to that of Lemma 2.4 in Freedman (1981) gives
[TABLE]
The center term converges to and the left and right terms converge to 0 a.s. by Lemma 3. Our result follows. ∎
Lemma 5**.**
* a.s. and a.s.*
Proof.
From the arguments in the proofs of Lemmas 1 and 2 we have that
[TABLE]
which converges to 0 a.s. by Lemma 4. Therefore the first convergence result holds. From the metric properties of the Mallows metric we have that
[TABLE]
Our second convergence result follows from the first convergence result and Lemma 8.4 of Bickel and Freedman (1981). ∎
Lemma 6**.**
Let and , , be vectors. Let
[TABLE]
and similarly for . Then
[TABLE]
where is the Frobenius norm.
Proof.
We have
[TABLE]
where the inequality follows from (Freedman, 1981, Lemma 2.7). ∎
The proof of Theorem 1 is now given. Before we this Theorem, define the operator to be the function that stacks the unique elements of any symmetric matrix .
Proof.
Exchange for in Theorem 3 and observe that
[TABLE]
From Lemma 5 we know that almost everywhere. Our result for part a) follows since is mean-zero normal with variance . We now show that part b) holds. First, we need to establish that almost everywhere. To see this, introduce
[TABLE]
Clearly, a.s. Let . We have,
[TABLE]
a.s. where the first inequality follows from Lemma 6 with and taking the place of and respectively, the second inequality follows from the fact that is positive definite a.s., and the convergence follows from Lemma 4.
Let . From Lemma 6 and the proof of Lemma 1 we see that,
[TABLE]
where the last inequality follows from the argument that proves Lemma 1 applied to the starred data, and a.s. It remains to show that converges to . Conditional on ,
[TABLE]
by Lemma 8.6 in Bickel and Freedman (1981). Now has conditional distribution and has law and Lemma 5 gives almost everywhere. We now show that a.s. by Lemma 8.5 of Bickel and Freedman (1981) with where . To do this, we show that can be chosen so that where and are the and norms respectively. From the definition of the Euclidean norm, we have . It is clear that for all . Now, pick . We see that
[TABLE]
A similar argument shows that converges to 0. Part c) follows from both a) and b). ∎
4.2 Random design and heteroskedasticity
In this section we provide the proof of Theorem 2. Several quantities and lemmas are introduced in order to prove Theorem 2. The logic follows that of (Freedman, 1981, Section 3). Define,
[TABLE]
The next two lemmas are needed to prove Theorem 2.
Lemma 7**.**
If as , then
- a)
* and ,*
- b)
the -law of converges to the -law of in ,
- c)
the -law of converges to the -law of in .
Proof.
Part a) immediately follows from (Bickel and Freedman, 1981, Lemma 8.3c).
We use (Bickel and Freedman, 1981, Lemma 8.3a) to verify part b). The weak convergence step is evident. Now,
[TABLE]
Let . Part b) follows from, integration with respect to , part a), and (Bickel and Freedman, 1981, Lemma 8.5) with . The steps involving (Bickel and Freedman, 1981, Lemma 8.5) are similar to those in the proof of Theorem 1.
Part c) follows from the same argument used to prove part b). ∎
Lemma 8**.**
* a.e. as .*
Proof.
The steps are the same as those in (Freedman, 1981, Lemma 3.2). ∎
The proof of Theorem 2 is now given.
Proof.
We can write
[TABLE]
where and . (Freedman, 1981, Theorem 3.1) implies that the conditional law, conditional on , , of . This verifies part a).
We now verify part b). From (Bickel and Freedman, 1981, Lemma 8.7), we have
[TABLE]
where the right side goes to 0 a.e. as . Lemma 8 states that a.e. in as and part b) of Lemma 7 implies that the distribution of , conditional on , , converges to . The random variable is normally distributed with mean 0 and variance matrix . Combining this with part a) verifies that the conditional distribution of converges to as . This completes the proof of part b).
Part c) follows from the same argument in the proof of Theorem 1 where Lemmas 8 and 7c combine to show that (5) converges to 0 as . Note that in this argument. This completes the proof. ∎
5 Acknowledgments
The author would like to thank Karl Oskar Ekvall, Forrest Crawford, Snigdhansu Chatterjee, Dennis Cook, and two anonymous referees for providing valuable feedback which led to the strengthening of this article.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bickel and Freedman [1981] P. J. Bickel and D. A. Freedman. Some asymptotic theory for the bootstrap. Ann. Statist. , 9:1196–1217, 1981.
- 2Chatterjee and Bose [2000] S. Chatterjee and A. Bose. Variance estimation in high dimensional models. Statist. Sin. , 10:497–515, 2000.
- 3Diaconis and Efron [1983] P. Diaconis and B. Efron. Computer intensive methods in statistics. Sci. Am. , 248, 1983.
- 4Freedman [1981] D. A. Freedman. Bootstrapping regression models. Ann. Statist. , 9:1218–1228, 1981.
- 5Freedman and Peters [1984] D. A. Freedman and S. C. Peters. Bootstrapping a regression equation: Some empirical results. J. Am. Statist. Assoc. , 79:97–106, 1984.
- 6Henderson and Velleman [1981] H. V. Henderson and P. F. Velleman. Building multiple regression models interactively. Biometrics , 37:391–411, 1981.
- 7Lai et al. [1979] T. Lai, H. Robbins, and V. Wei. Strong consistency of least squares estimated in multiple regression. J. Mult. Anal. , 9:343–361, 1979.
- 8Weisberg [2005] S. Weisberg. Applied Linear Regression . Wiley, New Jersey, 2005.
