Adversarial Rewards in Universal Learning for Contextual Bandits

Moise Blanchard; Steve Hanneke; Patrick Jaillet

arXiv:2302.07186·stat.ML·June 13, 2023

Adversarial Rewards in Universal Learning for Contextual Bandits

Moise Blanchard, Steve Hanneke, Patrick Jaillet

PDF

Open Access

TL;DR

This paper investigates the limits of universal learning in contextual bandits with evolving, potentially adversarial rewards, revealing fundamental impossibilities and conditions for learnability beyond traditional models.

Contribution

It demonstrates the impossibility of optimistic universal learning in adversarial reward settings and provides necessary and sufficient conditions for universal learning under various adversarial models.

Findings

01

Universal learning is impossible with adversarial rewards in general.

02

Conditions for universal learning depend on the reward model.

03

Learnable processes are larger than i.i.d. but smaller than supervised learning.

Abstract

We study the fundamental limits of learning in contextual bandits, where a learner's rewards depend on their actions and a known context, which extends the canonical multi-armed bandit to the case where side-information is available. We are interested in universally consistent algorithms, which achieve sublinear regret compared to any measurable fixed policy, without any function class restriction. For stationary contextual bandits, when the underlying reward mechanism is time-invariant, Blanchard et. al (2022) characterized learnable context processes for which universal consistency is achievable; and further gave algorithms ensuring universal consistency whenever this is achievable, a property known as optimistic universal consistency. It is well understood, however, that reward mechanisms can evolve over time, possibly adversarially, and depending on the learner's actions. We show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning