Multiclass learnability and the ERM principle
Amit Daniely, Sivan Sabato, Shai Ben-David, Shai, Shalev-Shwartz

TL;DR
This paper investigates the sample complexity of multiclass learning, revealing that different ERM learners can have varying efficiencies and proposing principles for designing optimal ERM learners, especially for symmetric classes.
Contribution
It introduces a principle for designing effective ERM learners and provides tight bounds on sample complexity for symmetric multiclass hypothesis classes.
Findings
Some ERM learners have lower sample complexity than others.
Certain classes are learnable by some ERM learners but not others.
Provides characterizations of mistake and regret bounds using generalized Littlestone's dimension.
Abstract
We study the sample complexity of multiclass prediction in several learning settings. For the PAC setting our analysis reveals a surprising phenomenon: In sharp contrast to binary classification, we show that there exist multiclass hypothesis classes for which some Empirical Risk Minimizers (ERM learners) have lower sample complexity than others. Furthermore, there are classes that are learnable by some ERM learners, while other ERM learners will fail to learn them. We propose a principle for designing good ERM learners, and use this principle to prove tight bounds on the sample complexity of learning {\em symmetric} multiclass hypothesis classes---classes that are invariant under permutations of label names. We further provide a characterization of mistake and regret bounds for multiclass learning in the online setting and the bandit setting, using new generalizations of Littlestone's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Domain Adaptation and Few-Shot Learning
