Multiclass Classification using dilute bandit feedback

Gaurav Batra; Naresh Manwani

arXiv:2105.08093·cs.LG·May 19, 2021

Multiclass Classification using dilute bandit feedback

Gaurav Batra, Naresh Manwani

PDF

TL;DR

This paper proposes a new online multiclass classification framework with diluted bandit feedback, where the algorithm predicts candidate label sets and learns under increased supervision uncertainty.

Contribution

It introduces the MC-DBF algorithm that effectively handles diluted bandit feedback with a novel mistake bound analysis for multiclass learning.

Findings

01

Achieves mistake bound of O(T^{1-1/(m+2)}) with candidate set size m

02

Demonstrates effectiveness through extensive simulations

03

Handles increased supervision uncertainty in online learning

Abstract

This paper introduces a new online learning framework for multiclass classification called learning with diluted bandit feedback. At every time step, the algorithm predicts a candidate label set instead of a single label for the observed example. It then receives feedback from the environment whether the actual label lies in this candidate label set or not. This feedback is called "diluted bandit feedback". Learning in this setting is even more challenging than the bandit feedback setting, as there is more uncertainty in the supervision. We propose an algorithm for multiclass classification using dilute bandit feedback (MC-DBF), which uses the exploration-exploitation strategy to predict the candidate set in each trial. We show that the proposed algorithm achieves O(T^{1-\frac{1}{m+2}}) mistake bound if candidate label set size (in each step) is m. We demonstrate the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.