Contextual Bandits and Optimistically Universal Learning

Moise Blanchard; Steve Hanneke; Patrick Jaillet

arXiv:2301.00241·stat.ML·January 3, 2023

Contextual Bandits and Optimistically Universal Learning

Moise Blanchard, Steve Hanneke, Patrick Jaillet

PDF

Open Access

TL;DR

This paper investigates the conditions under which a learner can achieve universal consistency in contextual bandit problems with general action and context spaces, introducing algorithms that adapt to various data-generating processes.

Contribution

It establishes necessary and sufficient conditions for universal consistency, introduces optimistically universal algorithms, and connects partial feedback learning to full-feedback supervised learning.

Findings

01

Universal consistency is achievable under broad conditions.

02

Existence of algorithms guaranteeing universal consistency when possible.

03

Learning with partial feedback matches full-feedback supervised learning in finite actions.

Abstract

We consider the contextual bandit problem on general action and context spaces, where the learner's rewards depend on their selected actions and an observable context. This generalizes the standard multi-armed bandit to the case where side information is available, e.g., patients' records or customers' history, which allows for personalized treatment. We focus on consistency -- vanishing regret compared to the optimal policy -- and show that for large classes of non-i.i.d. contexts, consistency can be achieved regardless of the time-invariant reward mechanism, a property known as universal consistency. Precisely, we first give necessary and sufficient conditions on the context-generating process for universal consistency to be possible. Second, we show that there always exists an algorithm that guarantees universal consistency whenever this is achievable, called an optimistically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research