Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare
Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe

TL;DR
This paper introduces an adaptive learning policy for restless bandits in preventive healthcare, using a Whittle index based Q-Learning approach to optimize interventions and prevent patient disengagement.
Contribution
It proposes a novel Whittle index based Q-Learning method for RMABs with unknown transitions, demonstrating convergence and improved performance over existing methods.
Findings
Method converges to the optimal solution.
Outperforms existing learning-based RMAB methods.
Effective on healthcare datasets and benchmarks.
Abstract
In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
