A Linear Programming Relaxation and a Heuristic for the Restless Bandit Problem with General Switching Costs
Jerome Le Ny, Munther Dahleh, Eric Feron

TL;DR
This paper extends a relaxation technique for the restless bandit problem to include general switching costs and develops a heuristic policy based on this relaxation, supported by computational experiments.
Contribution
It introduces a new relaxation method for restless bandits with switching costs and constructs a heuristic policy derived from this relaxation.
Findings
The heuristic performs well in computational experiments.
The relaxation provides a useful bound for approximate dynamic programming.
Empirical results support the effectiveness of the proposed approach.
Abstract
We extend a relaxation technique due to Bertsimas and Nino-Mora for the restless bandit problem to the case where arbitrary costs penalize switching between the bandits. We also construct a one-step lookahead policy using the solution of the relaxation. Computational experiments and a bound for approximate dynamic programming provide some empirical support for the heuristic.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems
