Greedy Bandits with Sampled Context

Dom Huh

arXiv:2007.16001·cs.LG·August 3, 2020

Greedy Bandits with Sampled Context

Dom Huh

PDF

Open Access

TL;DR

This paper introduces GB-SC, a novel approach combining Thompson Sampling and epsilon-greedy policies for contextual bandits, enabling evaluation of context-reward relationships and robustness to incomplete context data.

Contribution

The paper presents a new method, GB-SC, that integrates Bayesian prior development with greedy arm selection, enhancing contextual bandit performance and interpretability.

Findings

01

Competitive performance on Mushroom environment

02

Effective evaluation of context-reward dependency

03

Robustness to partially observable contexts

Abstract

Bayesian strategies for contextual bandits have proved promising in single-state reinforcement learning tasks by modeling uncertainty using context information from the environment. In this paper, we propose Greedy Bandits with Sampled Context (GB-SC), a method for contextual multi-armed bandits to develop the prior from the context information using Thompson Sampling, and arm selection using an epsilon-greedy policy. The framework GB-SC allows for evaluation of context-reward dependency, as well as providing robustness for partially observable context vectors by leveraging the prior developed. Our experimental results show competitive performance on the Mushroom environment in terms of expected regret and expected cumulative regret, as well as insights on how each context subset affects decision-making.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management