On-Demand Sampling: Learning Optimally from Multiple Distributions
Nika Haghtalab, Michael I. Jordan, Eric Zhao

TL;DR
This paper establishes the optimal sample complexity for multi-distribution learning paradigms, introducing algorithms that learn to sample on demand, significantly improving efficiency over previous methods in federated and collaborative learning.
Contribution
It provides the first optimal sample complexity bounds for multi-distribution learning, including group DRO, and introduces demand-based sampling algorithms with novel online learning extensions.
Findings
Sample complexity exceeds single distribution by only n log(n)/ε^2
Algorithms achieve optimal sample complexity bounds
First bounds provided for group DRO objective
Abstract
Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative learning, group distributionally robust optimization, and fair federated learning. In each of these settings, a learner seeks to uniformly minimize its expected loss over predefined data distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Importantly, our sample complexity bounds for multi-distribution learning exceed that of learning a single distribution by only an additive factor of . This improves upon the best known sample complexity bounds for fair federated learning by Mohri et al. and collaborative learning by Nguyen and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Stream Mining Techniques · Advanced Statistical Process Monitoring
