A note on the price of bandit feedback for mistake-bounded online   learning

Jesse Geneson

arXiv:2101.06891·cs.DM·February 2, 2021

A note on the price of bandit feedback for mistake-bounded online learning

Jesse Geneson

PDF

TL;DR

This paper examines the cost of bandit feedback in mistake-bounded online learning, correcting a false lemma in prior work and establishing a more accurate understanding of the relationship between standard and bandit models.

Contribution

The authors identify and fix a false lemma in previous theoretical results, providing a corrected proof regarding mistake bounds in bandit feedback models.

Findings

01

The false lemma holds when vectors are multiples mod p.

02

A new lemma corrects the mistake in prior proof.

03

The corrected proof clarifies the price of bandit feedback.

Abstract

The standard model and the bandit model are two generalizations of the mistake-bound model to online multiclass classification. In both models the learner guesses a classification in each round, but in the standard model the learner recieves the correct classification after each guess, while in the bandit model the learner is only told whether or not their guess is correct in each round. For any set $F$ of multiclass classifiers, define $o p t_{s t d} (F)$ and $o p t_{ban d i t} (F)$ to be the optimal worst-case number of prediction mistakes in the standard and bandit models respectively. Long (Theoretical Computer Science, 2020) claimed that for all $M > 2$ and infinitely many $k$ , there exists a set $F$ of functions from a set $X$ to a set $Y$ of size $k$ such that $o p t_{s t d} (F) = M$ and $o p t_{ban d i t} (F) \geq (1 - o (1)) (∣ Y ∣ ln ∣ Y ∣) o p t_{s t d} (F)$ . The proof of this result depended on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.