
TL;DR
This paper introduces GB-SC, a novel approach combining Thompson Sampling and epsilon-greedy policies for contextual bandits, enabling evaluation of context-reward relationships and robustness to incomplete context data.
Contribution
The paper presents a new method, GB-SC, that integrates Bayesian prior development with greedy arm selection, enhancing contextual bandit performance and interpretability.
Findings
Competitive performance on Mushroom environment
Effective evaluation of context-reward dependency
Robustness to partially observable contexts
Abstract
Bayesian strategies for contextual bandits have proved promising in single-state reinforcement learning tasks by modeling uncertainty using context information from the environment. In this paper, we propose Greedy Bandits with Sampled Context (GB-SC), a method for contextual multi-armed bandits to develop the prior from the context information using Thompson Sampling, and arm selection using an epsilon-greedy policy. The framework GB-SC allows for evaluation of context-reward dependency, as well as providing robustness for partially observable context vectors by leveraging the prior developed. Our experimental results show competitive performance on the Mushroom environment in terms of expected regret and expected cumulative regret, as well as insights on how each context subset affects decision-making.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
