A Pseudo-Likelihood Approach to Linear Regression with Partially Shuffled Data
Martin Slawski, Guoqing Diao, Emanuel Ben-David

TL;DR
This paper introduces a pseudo-likelihood method for linear regression with partially mismatched data, effectively handling data shuffling and providing robust parameter estimation, noise level, and mismatch fraction inference.
Contribution
The paper proposes a novel pseudo-likelihood approach with EM optimization for linear regression under partial shuffling, improving mismatch tolerance and enabling comprehensive inference.
Findings
Method scales well with sample size
Achieves near-oracle statistical performance
Can estimate noise and mismatch fraction
Abstract
Recently, there has been significant interest in linear regression in the situation where predictors and responses are not observed in matching pairs corresponding to the same statistical unit as a consequence of separate data collection and uncertainty in data integration. Mismatched pairs can considerably impact the model fit and disrupt the estimation of regression parameters. In this paper, we present a method to adjust for such mismatches under ``partial shuffling" in which a sufficiently large fraction of (predictors, response)-pairs are observed in their correct correspondence. The proposed approach is based on a pseudo-likelihood in which each term takes the form of a two-component mixture density. Expectation-Maximization schemes are proposed for optimization, which (i) scale favorably in the number of samples, and (ii) achieve excellent statistical performance relative to an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Statistical Methods and Models · Advanced Statistical Process Monitoring
MethodsTest · Linear Regression
