Least Squares Estimation Using Sketched Data with Heteroskedastic Errors
Sokbae Lee, Serena Ng

TL;DR
This paper demonstrates that using random projection sketches in least squares regression effectively neutralizes heteroskedasticity, simplifying inference without sacrificing accuracy, even with endogenous covariates in instrumental variables settings.
Contribution
It shows that sketched least squares estimates behave as if errors are homoskedastic, enabling simpler inference methods in large-scale regressions with heteroskedastic errors.
Findings
Random projections lead to asymptotically normal estimates with homoskedastic variance.
The results hold for both exogenous and endogenous covariates in IV estimation.
Sketching simplifies inference procedures like F tests for instrument relevance.
Abstract
Researchers may perform regressions using a sketch of data of size instead of the full sample of size for a variety of reasons. This paper considers the case when the regression errors do not have constant variance and heteroskedasticity robust standard errors would normally be needed for test statistics to provide accurate inference. We show that estimates using data sketched by random projections will behave `as if' the errors were homoskedastic. Estimation by random sampling would not have this property. The result arises because the sketched estimates in the case of random projections can be expressed as degenerate -statistics, and under certain conditions, these statistics are asymptotically normal with homoskedastic variance. We verify that the conditions hold not only in the case of least squares regression when the covariates are exogenous, but also in instrumental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Neural Networks and Applications · Statistical Methods and Inference
