A Sampling-Based Method for Gittins Index Approximation
Stef Baas, Richard J. Boucherie, Aleida Braaksma

TL;DR
This paper introduces a sampling-based approach to approximate the Gittins index for bandit processes, providing finite-time error bounds, confidence intervals, and an epsilon-optimal policy, with demonstrated superior performance over existing algorithms.
Contribution
The paper presents a novel sampling-based method for Gittins index approximation with finite-time error bounds and convergence proofs, improving multi-armed bandit decision strategies.
Findings
The method accurately approximates the Gittins index for Bernoulli and Gaussian bandits.
It outperforms Thompson sampling and UCB algorithms in a multi-armed bandit setting.
Finite-sample confidence intervals for the Gittins index are effectively constructed.
Abstract
A sampling-based method is introduced to approximate the Gittins index for a general family of alternative bandit processes. The approximation consists of a truncation of the optimization horizon and support for the immediate rewards, an optimal stopping value approximation, and a stochastic approximation procedure. Finite-time error bounds are given for the three approximations, leading to a procedure to construct a confidence interval for the Gittins index using a finite number of Monte Carlo samples, as well as an epsilon-optimal policy for the Bayesian multi-armed bandit. Proofs are given for almost sure convergence and convergence in distribution for the sampling based Gittins index approximation. In a numerical study, the approximation quality of the proposed method is verified for the Bernoulli bandit and Gaussian bandit with known variance, and the method is shown to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Forecasting Techniques and Applications
