POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes
Ruijia Zhang, Xiangyu Zhang, Zhengling Qi, Yue Wu, Yanxun Xu

TL;DR
POLAR is a new model-based policy learning algorithm for offline dynamic treatment regimes that incorporates uncertainty quantification and pessimistic penalties, providing statistical guarantees and improved performance over existing methods.
Contribution
POLAR introduces a novel pessimistic model-based approach for offline DTR optimization with finite-sample guarantees, addressing robustness and computational challenges.
Findings
POLAR outperforms state-of-the-art methods on synthetic and real data.
It provides finite-sample bounds on policy suboptimality.
Empirical results show near-optimal, history-aware treatment strategies.
Abstract
Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on strong positivity assumptions and lack robustness under partial data coverage, while offline reinforcement learning approaches typically focus on average training performance, lack statistical guarantees, and require solving complex optimization problems. To address these challenges, we propose POLAR, a novel pessimistic model-based policy learning algorithm for offline DTR optimization. POLAR estimates the transition dynamics from offline data and quantifies uncertainty for each history-action pair. A pessimistic penalty is then incorporated into the reward function to discourage actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning in Healthcare · Advanced Causal Inference Techniques
MethodsFocus
