Optimal Learners for Multiclass Problems

Amit Daniely; Shai Shalev-Shwartz

arXiv:1405.2420·cs.LG·May 13, 2014·34 cites

Optimal Learners for Multiclass Problems

Amit Daniely, Shai Shalev-Shwartz

PDF

Open Access

TL;DR

This paper establishes the optimality of improper learners for multiclass problems, demonstrating that ERM is suboptimal, and introduces computationally efficient learners with better sample complexity for generalized linear classifiers.

Contribution

It proves that optimal multiclass learners must be improper, analyzes the one-inclusion learner's optimality, and develops efficient learners with improved sample complexity.

Findings

01

ERM is not optimal for multiclass learning.

02

The one-inclusion learner is essentially optimal.

03

Efficient learners for generalized linear classifiers outperform ERM in sample complexity.

Abstract

The fundamental theorem of statistical learning states that for binary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for multiclass prediction. We start by proving a surprising result: a generic optimal multiclass learner must be improper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundmamental question of "how to learn"? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et al (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning