Continuous-in-time Limit for Bayesian Bandits
Yuhua Zhu, Zachary Izzo, Lexing Ying

TL;DR
This paper establishes a connection between Bayesian bandit problems and continuous HJB equations, providing a new approach to approximate Bayes-optimal policies that scale efficiently with large horizons.
Contribution
It introduces a continuous-time limit for Bayesian bandits via HJB equations, enabling explicit solutions and scalable approximate policies.
Findings
The Bayesian bandit problem converges to a continuous HJB equation under rescaling.
Explicit solutions are derived for several common bandit problems.
The proposed approximate policy maintains computational efficiency as the horizon grows.
Abstract
This paper revisits the bandit problem in the Bayesian setting. The Bayesian approach formulates the bandit problem as an optimization problem, and the goal is to find the optimal policy which minimizes the Bayesian regret. One of the main challenges facing the Bayesian approach is that computation of the optimal policy is often intractable, especially when the length of the problem horizon or the number of arms is large. In this paper, we first show that under a suitable rescaling, the Bayesian bandit problem converges toward a continuous Hamilton-Jacobi-Bellman (HJB) equation. The optimal policy for the limiting HJB equation can be explicitly obtained for several common bandit problems, and we give numerical methods to solve the HJB equation when an explicit solution is not available. Based on these results, we propose an approximate Bayes-optimal policy for solving Bayesian bandit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Adaptive Dynamic Programming Control
