From Theory to Practice with RAVEN-UCB: Addressing Non-Stationarity in Multi-Armed Bandits through Variance Adaptation
Junyi Fang, Yuxun Chen, Yuxin Chen, Chen Zhang

TL;DR
RAVEN-UCB is a new algorithm for non-stationary multi-armed bandits that adaptively uses variance information to improve exploration and achieve tighter regret bounds, demonstrating superior performance in dynamic environments.
Contribution
It introduces RAVEN-UCB, a variance-aware, adaptive algorithm with recursive updates, providing both theoretical guarantees and practical improvements over existing methods.
Findings
Achieves tighter regret bounds than UCB1 and UCB-V.
Performs better in non-stationary environments with distributional, periodic, and fluctuating changes.
Demonstrates robustness and efficiency in synthetic and logistics scenarios.
Abstract
The Multi-Armed Bandit (MAB) problem is challenging in non-stationary environments where reward distributions evolve dynamically. We introduce RAVEN-UCB, a novel algorithm that combines theoretical rigor with practical efficiency via variance-aware adaptation. It achieves tighter regret bounds than UCB1 and UCB-V, with gap-dependent regret of order and gap-independent regret of order . RAVEN-UCB incorporates three innovations: (1) variance-driven exploration using in confidence bounds, (2) adaptive control via , and (3) constant-time recursive updates for efficiency. Experiments across non-stationary patterns - distributional changes, periodic shifts, and temporary fluctuations - in synthetic and logistics scenarios demonstrate its superiority over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Forecasting Techniques and Applications · Smart Grid Energy Management
