How Does Variance Shape the Regret in Contextual Bandits?
Zeyu Jia, Jian Qian, Alexander Rakhlin, Chen-Yu Wei

TL;DR
This paper investigates how reward variance influences regret bounds in realizable contextual bandits with general function approximation, highlighting the role of eluder dimension and analyzing different adversarial settings.
Contribution
It introduces variance-dependent regret bounds in contextual bandits, emphasizing the importance of eluder dimension and providing nearly tight bounds for different adversarial scenarios.
Findings
Variance affects regret bounds in contextual bandits.
Eluder dimension is crucial in variance-dependent regret analysis.
New bounds are nearly tight under various adversarial models.
Abstract
We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax bounds, we show that the eluder dimension a complexity measure of the function classplays a crucial role in variance-dependent bounds. We consider two types of adversary: (1) Weak adversary: The adversary sets the reward variance before observing the learner's action. In this setting, we prove that a regret of is unavoidable when , where is the number of actions, is the total number of rounds, and is the total variance over rounds. For the regime, we derive a nearly matching upper bound for the special case where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Decision-Making and Behavioral Economics
