Orthogonal Subsampling for Big Data Linear Regression
Lin Wang, Jake Elmstedt, Weng Kee Wong, Hongquan Xu

TL;DR
This paper introduces an orthogonal subsampling method for big data linear regression that improves estimation accuracy, is computationally efficient, and suitable for distributed systems, outperforming existing subsampling techniques.
Contribution
The paper proposes a novel orthogonal subsampling approach for big data linear regression, enhancing efficiency, robustness, and suitability for parallel computing.
Findings
OSS outperforms existing methods in minimizing mean squared errors.
OSS provides more precise estimates of interaction effects.
The approach is robust to covariate interactions.
Abstract
The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal subsampling (OSS) approach for big data with a focus on linear regression models. The approach is inspired by the fact that an orthogonal array of two levels provides the best experimental design for linear regression models in the sense that it minimizes the average variance of the estimated parameters and provides the best predictions. The merits of OSS are three-fold: (i) it is easy to implement and fast; (ii) it is suitable for distributed parallel computing and ensures the subsamples selected in different batches have no common data points; and (iii) it outperforms existing methods in minimizing the mean squared errors of the estimated parameters and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimal Experimental Design Methods · Advanced Statistical Process Monitoring · Spectroscopy and Chemometric Analyses
