Linear Regression using Heterogeneous Data Batches

Ayush Jain; Rajat Sen; Weihao Kong; Abhimanyu Das; Alon Orlitsky

arXiv:2309.01973·cs.LG·September 6, 2023

Linear Regression using Heterogeneous Data Batches

Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky

PDF

Open Access 1 Video

TL;DR

This paper introduces a gradient-based method for linear regression with heterogeneous, unknown, and heavy-tailed data sources, improving applicability and efficiency over previous approaches that required Gaussian inputs and separation conditions.

Contribution

It presents a novel algorithm that handles diverse, unknown input distributions, recovers multiple subgroups without separation, and reduces data requirements.

Findings

01

Handles unknown, heavy-tailed, and diverse input distributions.

02

Recovers multiple subgroups with fewer and smaller batches.

03

Removes the need for separation between regression vectors.

Abstract

In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are $k$ subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, $\tilde{Ω} (k^{3/2})$ , batches of medium-size with $\tilde{Ω} (k)$ samples each. However, the paper requires that the input distribution for all $k$ subgroups be isotropic Gaussian, and states that removing this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Linear Regression using Heterogeneous Data Batches· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Machine Learning and ELM