A Framework for Fair Evaluation of Variance-Aware Bandit Algorithms
Elise Wolf

TL;DR
This paper introduces a standardized evaluation framework for multi-armed bandit algorithms, especially variance-aware ones, revealing their advantages in high-uncertainty environments and providing insights into their comparative performance.
Contribution
It presents a reproducible evaluation framework for MAB algorithms and analyzes conditions favoring variance-aware methods over classical algorithms.
Findings
Variance-aware algorithms excel in high-uncertainty settings.
Classical algorithms perform well in separable scenarios.
Extensive tuning benefits classical algorithms more.
Abstract
Multi-armed bandit (MAB) problems serve as a fundamental building block for more complex reinforcement learning algorithms. However, evaluating and comparing MAB algorithms remains challenging due to the lack of standardized conditions and replicability. This is particularly problematic for variance-aware extensions of classical methods like UCB, whose performance can heavily depend on the underlying environment. In this study, we address how performance differences between bandit algorithms can be reliably observed, and under what conditions variance-aware algorithms outperform classical ones. We present a reproducible evaluation designed to systematically compare eight classical and variance-aware MAB algorithms. The evaluation framework, implemented in our Bandit Playground codebase, features clearly defined experimental setups, multiple performance metrics (reward, regret, reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Artificial Intelligence in Games
