Leveraging User-Triggered Supervision in Contextual Bandits

Alekh Agarwal; Claudio Gentile; Teodor V. Marinov

arXiv:2302.03784·cs.LG·February 9, 2023

Leveraging User-Triggered Supervision in Contextual Bandits

Alekh Agarwal, Claudio Gentile, Teodor V. Marinov

PDF

Open Access

TL;DR

This paper introduces a new framework for utilizing user-triggered feedback in contextual bandit problems, improving learning efficiency while handling biased and partial user responses.

Contribution

It proposes a novel approach to incorporate user-triggered supervision into contextual bandits, enhancing regret guarantees and robustness against feedback bias.

Findings

01

Improved regret bounds for algorithms using user-triggered feedback

02

Robustness to biased and partial user responses

03

Enhanced performance in practical text prediction scenarios

Abstract

We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. Such an interaction arises, for example, in text prediction or autocompletion settings, where a poor suggestion is simply ignored and the user enters the desired text instead. Crucially, this extra feedback is user-triggered on only a subset of the contexts. We develop a new framework to leverage such signals, while being robust to their biased nature. We also augment standard CB algorithms to leverage the signal, and show improved regret guarantees for the resulting algorithms under a variety of conditions on the helpfulness of and bias inherent in this feedback.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Cognitive Radio Networks and Spectrum Sensing