Optimal Multi-Distribution Learning

Zihan Zhang; Wenhao Zhan; Yuxin Chen; Simon S. Du; Jason D. Lee

arXiv:2312.05134·cs.LG·August 12, 2025·1 cites

Optimal Multi-Distribution Learning

Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du, Jason D. Lee

PDF

Open Access

TL;DR

This paper introduces a new algorithm for multi-distribution learning that achieves near-optimal sample complexity, addressing key open problems and extending to Rademacher classes, with implications for robustness and fairness.

Contribution

The paper presents a novel, oracle-efficient algorithm for MDL that matches lower bounds and extends to Rademacher classes, resolving open problems in the field.

Findings

01

Achieves sample complexity of (d+k)/ε^2, matching lower bounds.

02

Establishes the necessity of randomization in MDL.

03

Extends results to Rademacher classes.

Abstract

Multi-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions, has emerged as a unified framework in response to the evolving demand for robustness, fairness, multi-group collaboration, etc. Achieving data-efficient MDL necessitates adaptive sampling, also called on-demand sampling, throughout the learning process. However, there exist substantial gaps between the state-of-the-art upper and lower bounds on the optimal sample complexity. Focusing on a hypothesis class of Vapnik-Chervonenkis (VC) dimension d, we propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon^2 (modulo some logarithmic factor), matching the best-known lower bound. Our algorithmic ideas and theory are further extended to accommodate Rademacher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques

MethodsMinimum Description Length