An optimal learning method for developing personalized treatment regimes
Yingfei Wang, Warren Powell

TL;DR
This paper introduces a Bayesian, knowledge-gradient-based method for developing personalized treatment regimes, optimizing sequential medical decisions to improve outcomes and reduce costs, demonstrated on a knee replacement dataset.
Contribution
It proposes a novel Bayesian contextual bandit approach with a knowledge gradient policy for personalized treatment, incorporating prior information and sequential learning.
Findings
Improved success rates through careful physician selection.
Effective reduction of healthcare costs in real-world data.
Handling of sparse data with clustering and LASSO.
Abstract
A treatment regime is a function that maps individual patient information to a recommended treatment, hence explicitly incorporating the heterogeneity in need for treatment across individuals. Patient responses are dichotomous and can be predicted through an unknown relationship that depends on the patient information and the selected treatment. The goal is to find the treatments that lead to the best patient responses on average. Each experiment is expensive, forcing us to learn the most from each experiment. We adopt a Bayesian approach both to incorporate possible prior information and to update our treatment regime continuously as information accrues, with the potential to allow smaller yet more informative trials and for patients to receive better treatment. By formulating the problem as contextual bandits, we introduce a knowledge gradient policy to guide the treatment assignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning in Healthcare · Machine Learning and Algorithms
