Semi-supervised learning in unmatched linear regression using an empirical likelihood approach
Fadoua Balabdaoui, Jinyu Chen

TL;DR
This paper develops a semi-supervised empirical likelihood method for linear regression when the link between predictors and outcomes is unknown, leveraging both matched and unmatched data to improve estimation accuracy.
Contribution
It introduces a novel semi-supervised maximum likelihood estimator for unmatched linear regression, providing asymptotic properties and quantifying the benefit of additional unmatched data.
Findings
The estimator is asymptotically normal under mild conditions.
Unmatched data significantly improve estimation accuracy.
Simulation and real data demonstrate the method's effectiveness.
Abstract
Knowing the link between observed predictive variables and outcomes is crucial for making inference in any regression model. When this link is missing, partially or completely, classical estimation methods fail in recovering the true regression function. Deconvolution approaches have been proposed and studied in detail in the unmatched setting where the predictive variables and responses are allowed to be independent. In this work, we consider linear regression in a semi-supervised learning setting where, beside a small sample of matched data, we have access to a relatively large unmatched sample. Using maximum likelihood estimation, we show that under some mild assumptions the semi-supervised learning empirical maximum likelihood estimator (SSLEMLE) is asymptotically normal and give explicitly its asymptotic covariance matrix as a function of the ratio of the matched/unmatched sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Stochastic Gradient Optimization Techniques
