Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts
Hongju Park, Mohamad Kazem Shirani Faradonbeh

TL;DR
This paper introduces an efficient posterior sampling algorithm for learning control policies in contextual bandits with imperfect, noisy context observations, addressing a gap in existing methods for such settings.
Contribution
It proposes a novel implementable algorithm for bandits with unobserved contexts and analyzes its performance, extending learning control to more realistic noisy observation scenarios.
Findings
Algorithm demonstrates efficiency in learning from noisy, imperfect observations.
Performance depends on number of arms, dimensions, and noise levels.
Numerical results validate the effectiveness of the proposed method.
Abstract
Contextual bandits are widely-used in the study of learning-based control policies for finite action spaces. While the problem is well-studied for bandits with perfectly observed context vectors, little is known about the case of imperfectly observed contexts. For this setting, existing approaches are inapplicable and new conceptual and technical frameworks are required. We present an implementable posterior sampling algorithm for bandits with imperfect context observations and study its performance for learning optimal decisions. The provided numerical results relate the performance of the algorithm to different quantities of interest including the number of arms, dimensions, observation matrices, posterior rescaling factors, and signal-to-noise ratios. In general, the proposed algorithm exposes efficiency in learning from the noisy imperfect observations and taking actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Advanced Control Systems Optimization
