Regret Analysis of the Finite-Horizon Gittins Index Strategy for   Multi-Armed Bandits

Tor Lattimore

arXiv:1511.06014·cs.LG·May 31, 2016·57 cites

Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits

Tor Lattimore

PDF

Open Access

TL;DR

This paper provides a finite-time regret analysis of the Gittins index strategy for Gaussian multi-armed bandits, showing it has regret guarantees comparable to UCB and Thompson sampling, with practical improvements.

Contribution

It offers the first finite-time regret bounds for the Gittins index strategy in Gaussian bandits and compares its performance to existing algorithms.

Findings

01

Gittins index strategy achieves finite-time regret bounds similar to UCB.

02

Finite-time bounds on the Gittins index are asymptotically exact.

03

Experimental results show a version of Gittins index outperforms UCB and Thompson sampling.

Abstract

I analyse the frequentist regret of the famous Gittins index strategy for multi-armed bandits with Gaussian noise and a finite horizon. Remarkably it turns out that this approach leads to finite-time regret guarantees comparable to those available for the popular UCB algorithm. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss some computational issues and present experimental results suggesting that a particular version of the Gittins index strategy is a modest improvement on existing algorithms with finite-time regret guarantees such as UCB and Thompson sampling.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms