Optimal Learners for Multiclass Problems
Amit Daniely, Shai Shalev-Shwartz

TL;DR
This paper establishes the optimality of improper learners for multiclass problems, demonstrating that ERM is suboptimal, and introduces computationally efficient learners with better sample complexity for generalized linear classifiers.
Contribution
It proves that optimal multiclass learners must be improper, analyzes the one-inclusion learner's optimality, and develops efficient learners with improved sample complexity.
Findings
ERM is not optimal for multiclass learning.
The one-inclusion learner is essentially optimal.
Efficient learners for generalized linear classifiers outperform ERM in sample complexity.
Abstract
The fundamental theorem of statistical learning states that for binary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for multiclass prediction. We start by proving a surprising result: a generic optimal multiclass learner must be improper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundmamental question of "how to learn"? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et al (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning
