On-Demand Sampling: Learning Optimally from Multiple Distributions

Nika Haghtalab; Michael I. Jordan; Eric Zhao

arXiv:2210.12529·cs.LG·April 4, 2024

On-Demand Sampling: Learning Optimally from Multiple Distributions

Nika Haghtalab, Michael I. Jordan, Eric Zhao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper establishes the optimal sample complexity for multi-distribution learning paradigms, introducing algorithms that learn to sample on demand, significantly improving efficiency over previous methods in federated and collaborative learning.

Contribution

It provides the first optimal sample complexity bounds for multi-distribution learning, including group DRO, and introduces demand-based sampling algorithms with novel online learning extensions.

Findings

01

Sample complexity exceeds single distribution by only n log(n)/ε^2

02

Algorithms achieve optimal sample complexity bounds

03

First bounds provided for group DRO objective

Abstract

Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative learning, group distributionally robust optimization, and fair federated learning. In each of these settings, a learner seeks to uniformly minimize its expected loss over $n$ predefined data distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Importantly, our sample complexity bounds for multi-distribution learning exceed that of learning a single distribution by only an additive factor of $n lo g (n) / ϵ^{2}$ . This improves upon the best known sample complexity bounds for fair federated learning by Mohri et al. and collaborative learning by Nguyen and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ericzhao28/multidistributionlearning
pytorchOfficial

Videos

On-Demand Sampling: Learning Optimally from Multiple Distributions· slideslive

Taxonomy

TopicsData Stream Mining Techniques · Advanced Statistical Process Monitoring