Linear Regression using Heterogeneous Data Batches
Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky

TL;DR
This paper introduces a gradient-based method for linear regression with heterogeneous, unknown, and heavy-tailed data sources, improving applicability and efficiency over previous approaches that required Gaussian inputs and separation conditions.
Contribution
It presents a novel algorithm that handles diverse, unknown input distributions, recovers multiple subgroups without separation, and reduces data requirements.
Findings
Handles unknown, heavy-tailed, and diverse input distributions.
Recovers multiple subgroups with fewer and smaller batches.
Removes the need for separation between regression vectors.
Abstract
In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, , batches of medium-size with samples each. However, the paper requires that the input distribution for all subgroups be isotropic Gaussian, and states that removing this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Machine Learning and ELM
