The Sample Complexity of Multi-Distribution Learning for VC Classes

Pranjal Awasthi; Nika Haghtalab; Eric Zhao

arXiv:2307.12135·cs.LG·July 25, 2023

The Sample Complexity of Multi-Distribution Learning for VC Classes

Pranjal Awasthi, Nika Haghtalab, Eric Zhao

PDF

Open Access

TL;DR

This paper investigates the sample complexity of learning VC classes across multiple distributions, highlighting gaps between existing upper and lower bounds and discussing recent progress and fundamental hurdles in the field.

Contribution

It analyzes the bounds for multi-distribution learning of VC classes, clarifies the gap between known upper and lower bounds, and discusses recent advancements and challenges.

Findings

01

Upper bound on sample complexity: $O(rac{1}{\

02

Lower bound: $\\Omega(rac{1}{\

03

Discussion of fundamental hurdles in game dynamics for statistical learning.

Abstract

Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on $k$ distributions to be $O (ϵ^{- 2} ln (k) (d + k) + min {ϵ^{- 1} d k, ϵ^{- 4} ln (k) d})$ , the best lower bound is $Ω (ϵ^{- 2} (d + k ln (k)))$ . We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification