A One-Size-Fits-All Solution to Conservative Bandit Problems

Yihan Du; Siwei Wang; Longbo Huang

arXiv:2012.07341·cs.LG·February 8, 2023

A One-Size-Fits-All Solution to Conservative Bandit Problems

Yihan Du, Siwei Wang, Longbo Huang

PDF

Open Access 1 Video

TL;DR

This paper introduces a universal approach to conservative bandit problems with sample-path reward constraints, achieving improved theoretical guarantees and extending to mean-variance considerations, validated through empirical results.

Contribution

It presents a unified solution for various conservative bandit problems focusing on sample-path constraints, with novel algorithms and theoretical guarantees.

Findings

01

Achieves $T$-independent additive regrets in conservative bandits.

02

Extends to mean-variance bandit problem with $O(1/T)$ regret.

03

Demonstrates superior empirical performance over previous methods.

Abstract

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees ( $T$ -independent additive regrets instead of $T$ -dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A One-Size-Fits-All Solution to Conservative Bandit Problems· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems