Efficient Online Set-valued Classification with Bandit Feedback
Zhou Wang, Xingye Qiao

TL;DR
This paper introduces BCCP, a novel method for set-valued classification in online bandit feedback settings, providing coverage guarantees despite limited label information and sparse data.
Contribution
The paper proposes BCCP, a new conformal prediction approach that works with bandit feedback, addressing label sparsity and providing class-specific coverage guarantees.
Findings
BCCP achieves reliable coverage guarantees in online bandit settings.
The method effectively handles sparse and partial label information.
Experimental results demonstrate improved set accuracy over existing methods.
Abstract
Conformal prediction is a distribution-free method that wraps a given machine learning model and returns a set of plausible labels that contain the true label with a prescribed coverage rate. In practice, the empirical coverage achieved highly relies on fully observed label information from data both in the training phase for model fitting and the calibration phase for quantile estimation. This dependency poses a challenge in the context of online learning with bandit feedback, where a learner only has access to the correctness of actions (i.e., pulled an arm) but not the full information of the true label. In particular, when the pulled arm is incorrect, the learner only knows that the pulled one is not the true class label, but does not know which label is true. Additionally, bandit feedback further results in a smaller labeled dataset for calibration, limited to instances with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management
MethodsSparse Evolutionary Training
