Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari; Christos Thrampoulidis; Mahnoosh Alizadeh

arXiv:2010.00081·cs.LG·October 2, 2020·6 cites

Stage-wise Conservative Linear Bandits

Ahmadreza Moradipari, Christos Thrampoulidis, Mahnoosh Alizadeh

PDF

Open Access 1 Video

TL;DR

This paper introduces two algorithms for stage-wise conservative linear bandits that ensure safety constraints while optimizing rewards, providing regret bounds and adaptability to various constraint settings.

Contribution

The paper proposes novel algorithms, SCLTS and SCLUCB, for safe linear bandit optimization with regret guarantees and flexibility for different safety constraint scenarios.

Findings

01

SCLTS and SCLUCB achieve regret bounds of O(√T log^{3/2}T) and O(√T log T).

02

Algorithms limit the number of baseline actions to O(log T).

03

Methods adapt to constraints with bandit feedback and unknown baseline actions.

Abstract

We study stage-wise conservative linear stochastic bandits: an instance of bandit optimization, which accounts for (unknown) safety constraints that appear in applications such as online advertising and medical trials. At each stage, the learner must choose actions that not only maximize cumulative reward across the entire time horizon but further satisfy a linear baseline constraint that takes the form of a lower bound on the instantaneous reward. For this problem, we present two novel algorithms, stage-wise conservative linear Thompson Sampling (SCLTS) and stage-wise conservative linear UCB (SCLUCB), that respect the baseline constraints and enjoy probabilistic regret bounds of order O(\sqrt{T} \log^{3/2}T) and O(\sqrt{T} \log T), respectively. Notably, the proposed algorithms can be adjusted with only minor modifications to tackle different problem variations, such as constraints…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stage-wise Conservative Linear Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms