How Does Variance Shape the Regret in Contextual Bandits?

Zeyu Jia; Jian Qian; Alexander Rakhlin; Chen-Yu Wei

arXiv:2410.12713·cs.LG·November 28, 2024

How Does Variance Shape the Regret in Contextual Bandits?

Zeyu Jia, Jian Qian, Alexander Rakhlin, Chen-Yu Wei

PDF

Open Access

TL;DR

This paper investigates how reward variance influences regret bounds in realizable contextual bandits with general function approximation, highlighting the role of eluder dimension and analyzing different adversarial settings.

Contribution

It introduces variance-dependent regret bounds in contextual bandits, emphasizing the importance of eluder dimension and providing nearly tight bounds for different adversarial scenarios.

Findings

01

Variance affects regret bounds in contextual bandits.

02

Eluder dimension is crucial in variance-dependent regret analysis.

03

New bounds are nearly tight under various adversarial models.

Abstract

We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax bounds, we show that the eluder dimension $d_{elu}$ $-$ a complexity measure of the function class $-$ plays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of $Ω (min {A, d_{elu}} Λ + d_{elu})$ is unavoidable when $d_{elu} \leq A T$ , where $A$ is the number of actions, $T$ is the total number of rounds, and $Λ$ is the total variance over $T$ rounds. For the $A \leq d_{elu}$ regime, we derive a nearly matching upper bound $\tilde{O} (A Λ + d_{elu})$ for the special case where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForecasting Techniques and Applications · Decision-Making and Behavioral Economics