Computing a classic index for finite-horizon bandits
Jos\'e Ni\~no-Mora

TL;DR
This paper develops an efficient recursive algorithm to compute the finite-horizon Gittins index for bandit problems, enabling better decision-making in finite-horizon settings and extending classic results from the 1950s.
Contribution
It introduces a recursive adaptive-greedy algorithm for exact index computation in finite-horizon bandits, improving computational efficiency and practical applicability.
Findings
Algorithm computes the index in (pseudo-)polynomial time.
Performance benchmarked against conventional methods.
Complexity reduces for projects with limited transitions.
Abstract
This paper considers the efficient exact computation of the counterpart of the Gittins index for a finite-horizon discrete-state bandit, which measures for each initial state the average productivity, given by the maximum ratio of expected total discounted reward earned to expected total discounted time expended that can be achieved through a number of successive plays stopping by the given horizon. Besides characterizing optimal policies for the finite-horizon one-armed bandit problem, such an index provides a suboptimal heuristic index rule for the intractable finite-horizon multiarmed bandit problem, which represents the natural extension of the Gittins index rule (optimal in the infinite-horizon case). Although such a finite-horizon index was introduced in classic work in the 1950s, investigation of its efficient exact computation has received scant attention. This paper introduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
