POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes

Ruijia Zhang; Xiangyu Zhang; Zhengling Qi; Yue Wu; Yanxun Xu

arXiv:2506.20406·stat.ML·January 30, 2026

POLAR: A Pessimistic Model-based Policy Learning Algorithm for Dynamic Treatment Regimes

Ruijia Zhang, Xiangyu Zhang, Zhengling Qi, Yue Wu, Yanxun Xu

PDF

Open Access

TL;DR

POLAR is a new model-based policy learning algorithm for offline dynamic treatment regimes that incorporates uncertainty quantification and pessimistic penalties, providing statistical guarantees and improved performance over existing methods.

Contribution

POLAR introduces a novel pessimistic model-based approach for offline DTR optimization with finite-sample guarantees, addressing robustness and computational challenges.

Findings

01

POLAR outperforms state-of-the-art methods on synthetic and real data.

02

It provides finite-sample bounds on policy suboptimality.

03

Empirical results show near-optimal, history-aware treatment strategies.

Abstract

Dynamic treatment regimes (DTRs) provide a principled framework for optimizing sequential decision-making in domains where decisions must adapt over time in response to individual trajectories, such as healthcare, education, and digital interventions. However, existing statistical methods often rely on strong positivity assumptions and lack robustness under partial data coverage, while offline reinforcement learning approaches typically focus on average training performance, lack statistical guarantees, and require solving complex optimization problems. To address these challenges, we propose POLAR, a novel pessimistic model-based policy learning algorithm for offline DTR optimization. POLAR estimates the transition dynamics from offline data and quantifies uncertainty for each history-action pair. A pessimistic penalty is then incorporated into the reward function to discourage actions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning in Healthcare · Advanced Causal Inference Techniques

MethodsFocus