TL;DR
This paper introduces a novel inverse reinforcement learning approach for restless multi-armed bandits, enabling public health applications to optimize resource allocation without known reward functions, demonstrated in maternal and child health programs.
Contribution
It is the first to apply IRL to RMABs in public health, allowing goal specification at scale and improving reward learning efficiency and accuracy.
Findings
Outperforms existing baselines in run-time and accuracy
Successfully applied to thousands of beneficiaries in India
Enables scalable, goal-driven resource allocation in health settings
Abstract
Public health practitioners often have the goal of monitoring patients and maximizing patients' time spent in "favorable" or healthy states while being constrained to using limited resources. Restless multi-armed bandits (RMAB) are an effective model to solve this problem as they are helpful to allocate limited resources among many agents under resource constraints, where patients behave differently depending on whether they are intervened on or not. However, RMABs assume the reward function is known. This is unrealistic in many public health settings because patients face unique challenges and it is impossible for a human to know who is most deserving of any intervention at such a large scale. To address this shortcoming, this paper is the first to present the use of inverse reinforcement learning (IRL) to learn desired rewards for RMABs, and we demonstrate improved outcomes in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
