Linear Regression with Shuffled Labels
Abubakar Abid, Ada Poon, James Zou

TL;DR
This paper investigates linear regression with shuffled labels, proposing new estimators that recover model weights despite label permutation, and demonstrating their effectiveness on synthetic and real datasets.
Contribution
It introduces estimators based on self-moments for linear regression with shuffled labels and extends them to partial ordering scenarios, enabling robust inference.
Findings
Classical least-squares estimator is inconsistent with shuffled labels.
Proposed estimators can recover approximate weights from shuffled data.
Framework applicable to practical experiments like flow cytometry.
Abstract
Is it possible to perform linear regression on datasets whose labels are shuffled with respect to the inputs? We explore this question by proposing several estimators that recover the weights of a noisy linear model from labels that are shuffled by an unknown permutation. We show that the analog of the classical least-squares estimator produces inconsistent estimates in this setting, and introduce an estimator based on the self-moments of the input features and labels. We study the regimes in which each estimator excels, and generalize the estimators to the setting where partial ordering information is available in the form of experiments replicated independently. The result is a framework that enables robust inference, as we demonstrate by experiments on both synthetic and standard datasets, where we are able to recover approximate weights using only shuffled labels. Our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Control Systems and Identification · Neural Networks and Applications
MethodsLinear Regression
