A One-Size-Fits-All Solution to Conservative Bandit Problems
Yihan Du, Siwei Wang, Longbo Huang

TL;DR
This paper introduces a universal approach to conservative bandit problems with sample-path reward constraints, achieving improved theoretical guarantees and extending to mean-variance considerations, validated through empirical results.
Contribution
It presents a unified solution for various conservative bandit problems focusing on sample-path constraints, with novel algorithms and theoretical guarantees.
Findings
Achieves $T$-independent additive regrets in conservative bandits.
Extends to mean-variance bandit problem with $O(1/T)$ regret.
Demonstrates superior empirical performance over previous methods.
Abstract
In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees (-independent additive regrets instead of -dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
