Leveraging User-Triggered Supervision in Contextual Bandits
Alekh Agarwal, Claudio Gentile, Teodor V. Marinov

TL;DR
This paper introduces a new framework for utilizing user-triggered feedback in contextual bandit problems, improving learning efficiency while handling biased and partial user responses.
Contribution
It proposes a novel approach to incorporate user-triggered supervision into contextual bandits, enhancing regret guarantees and robustness against feedback bias.
Findings
Improved regret bounds for algorithms using user-triggered feedback
Robustness to biased and partial user responses
Enhanced performance in practical text prediction scenarios
Abstract
We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. Such an interaction arises, for example, in text prediction or autocompletion settings, where a poor suggestion is simply ignored and the user enters the desired text instead. Crucially, this extra feedback is user-triggered on only a subset of the contexts. We develop a new framework to leverage such signals, while being robust to their biased nature. We also augment standard CB algorithms to leverage the signal, and show improved regret guarantees for the resulting algorithms under a variety of conditions on the helpfulness of and bias inherent in this feedback.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Cognitive Radio Networks and Spectrum Sensing
