Stochastic Conservative Contextual Linear Bandits

Jiabin Lin; Xian Yeow Lee; Talukder Jubery; Shana Moothedath; Soumik; Sarkar; and Baskar Ganapathysubramanian

arXiv:2203.15629·cs.LG·March 30, 2022·1 cites

Stochastic Conservative Contextual Linear Bandits

Jiabin Lin, Xian Yeow Lee, Talukder Jubery, Shana Moothedath, Soumik, Sarkar, and Baskar Ganapathysubramanian

PDF

Open Access

TL;DR

This paper introduces a conservative stochastic contextual bandit algorithm that ensures safety constraints in real-time decision making under uncertainty, with proven regret bounds and validation on synthetic and real-world data.

Contribution

It develops a novel conservative linear UCB algorithm for contextual bandits with unknown contexts and safety constraints, providing theoretical regret bounds and empirical validation.

Findings

01

Regret bound decomposes into standard UCB regret and two constant terms.

02

Algorithm maintains safety constraints at all times during learning.

03

Validated on synthetic and maize field data showing effective performance.

Abstract

Many physical systems have underlying safety considerations that require that the strategy deployed ensures the satisfaction of a set of constraints. Further, often we have only partial information on the state of the system. We study the problem of safe real-time decision making under uncertainty. In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time decision making when an adversary chooses a distribution on the set of possible contexts and the learner is subject to certain safety/performance constraints. The learner observes only the context distribution and the exact context is unknown, and the goal is to develop an algorithm that selects a sequence of optimal actions to maximize the cumulative reward without violating the safety constraints at any time step. By leveraging the UCB algorithm for this setting, we propose a conservative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Algorithms