Semi-supervised linear regression with missing covariates
Benedict M. Risebrow, Thomas B. Berrett

TL;DR
This paper develops new methods for linear regression with missing covariates, leveraging both labeled and unlabeled data, and provides theoretical guarantees showing their optimality under various missingness patterns.
Contribution
It introduces estimators for regression with missing covariates that are rate optimal and applicable to both sparse and non-sparse settings, filling a gap in the theoretical understanding.
Findings
Proposed estimators achieve minimax optimal risk bounds.
Established the first matching bounds for missing data in supervised settings.
Demonstrated effectiveness through simulations and real data application.
Abstract
Missing values in datasets are common in applied statistics. For regression problems, theoretical work thus far has largely considered the issue of missing covariates as distinct from missing responses. However, in practice, many datasets have both forms of missingness. Motivated by this gap, we study linear regression with a labelled dataset containing missing covariates, potentially alongside an unlabelled dataset. We consider both structured (blockwise-missing) and unstructured missingness patterns, along with sparse and non-sparse regression parameters. For the non-sparse case, we provide an estimator based on imputing the missing data combined with a reweighting step. For the high-dimensional sparse case, we use a modified version of the Dantzig selector. We provide non-asymptotic upper bounds on the risk of both procedures. These are matched by several new minimax lower bounds,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Bayesian Methods and Mixture Models
