Data enriched linear regression
Aiyou Chen, Art B. Owen, Minghui Shi

TL;DR
This paper introduces a linear regression approach that leverages a large, potentially biased dataset alongside a small, accurate dataset, employing penalization to improve predictions, especially in high-dimensional settings.
Contribution
It develops a novel shrinkage-based linear regression method that effectively combines two datasets with bias, providing new theoretical insights and practical tuning strategies.
Findings
Shrinkage method outperforms traditional models in small data settings.
Inadmissibility of using only small data when model has ≥5 coefficients.
Method achieves lower squared errors in high-dimensional, low-bias scenarios.
Abstract
We present a linear regression method for predictions on a small data set making use of a second possibly biased data set that may be much larger. Our method fits linear regressions to the two data sets while penalizing the difference between predictions made by those two models. The resulting algorithm is a shrinkage method similar to those used in small area estimation. We find a Stein-type finding for Gaussian responses: when the model has 5 or more coefficients and 10 or more error degrees of freedom, it becomes inadmissible to use only the small data set, no matter how large the bias is. We also present both plug-in and AICc-based methods to tune our penalty parameter. Most of our results use an penalty, but we obtain formulas for penalized estimates when the model is specialized to the location setting. Ordinary Stein shrinkage provides an inadmissibility result for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Soil Geostatistics and Mapping · Statistical Methods and Bayesian Inference
