Lagrangian Index Policy for Restless Bandits with Average Reward

Konstantin Avrachenkov; Vivek S. Borkar; Pratik Shah

arXiv:2412.12641·cs.LG·January 1, 2026

Lagrangian Index Policy for Restless Bandits with Average Reward

Konstantin Avrachenkov, Vivek S. Borkar, Pratik Shah

PDF

Open Access

TL;DR

This paper introduces the Lagrangian Index Policy (LIP) for restless bandits, compares it with the Whittle Index Policy, and develops reinforcement learning algorithms for online implementation, demonstrating its efficiency and theoretical properties.

Contribution

It proposes the LIP as a robust alternative to WIP, provides analytical calculations for specific models, and introduces memory-efficient RL algorithms for LIP in a model-free setting.

Findings

01

LIP performs well even when WIP performs poorly.

02

Reinforcement learning algorithms for LIP require less memory.

03

Analytical calculation of Lagrangian index for restart models.

Abstract

We study the Lagrangian Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrangian index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Healthcare Operations and Scheduling Optimization · Smart Grid Energy Management