Analysis of Thompson Sampling for Partially Observable Contextual   Multi-Armed Bandits

Hongju Park; Mohamad Kazem Shirani Faradonbeh

arXiv:2110.12175·stat.ML·November 30, 2021

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Hongju Park, Mohamad Kazem Shirani Faradonbeh

PDF

Open Access

TL;DR

This paper introduces a Thompson Sampling algorithm tailored for partially observable contextual multi-armed bandits, providing theoretical guarantees on regret and learning rates, with empirical validation.

Contribution

It develops a novel Thompson Sampling approach for partially observable contexts and proves regret bounds and learning rates, extending existing methods to more realistic scenarios.

Findings

01

Regret scales logarithmically with time and number of arms.

02

Regret scales linearly with the dimension of the context.

03

Numerical analyses support theoretical results.

Abstract

Contextual multi-armed bandits are classical models in reinforcement learning for sequential decision-making associated with individual information. A widely-used policy for bandits is Thompson Sampling, where samples from a data-driven probabilistic belief about unknown parameters are used to select the control actions. For this computationally fast algorithm, performance analyses are available under full context-observations. However, little is known for problems that contexts are not fully observed. We propose a Thompson Sampling algorithm for partially observable contextual multi-armed bandits, and establish theoretical performance guarantees. Technically, we show that the regret of the presented policy scales logarithmically with time and the number of arms, and linearly with the dimension. Further, we establish rates of learning unknown parameters, and provide illustrative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Machine Learning and Algorithms