Online Stochastic Optimization under Correlated Bandit Feedback
Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill

TL;DR
This paper introduces the HCT algorithm for online stochastic optimization in bandit settings with correlated rewards, achieving improved regret bounds, lower memory use, and weaker smoothness assumptions, with applications to reinforcement learning.
Contribution
The paper presents the high-confidence tree (HCT) algorithm, a novel approach that handles correlated rewards and improves upon existing methods in regret, memory, and smoothness requirements.
Findings
HCT achieves regret bounds matching state-of-the-art.
HCT effectively handles correlated reward processes.
Preliminary empirical results support HCT's applicability.
Abstract
In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel any-time -armed bandit algorithm, and derive regret bounds matching the performance of existing state-of-the-art in terms of dependency on number of steps and smoothness factor. The main advantage of HCT is that it handles the challenging case of correlated rewards, whereas existing methods require that the reward-generating process of each arm is an identically and independent distributed (iid) random process. HCT also improves on the state-of-the-art in terms of its memory requirement as well as requiring a weaker smoothness assumption on the mean-reward function in compare to the previous anytime algorithms. Finally, we discuss how HCT can be applied to the problem of policy search in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
