A Penalized Shared-parameter Algorithm for Estimating Optimal Dynamic Treatment Regimens
Palash Ghosh, Xinru Wang, Trikay Nalamada, Shruti Agarwal, Maria Jahja, and Bibhas Chakraborty

TL;DR
This paper introduces a penalized shared-parameter algorithm for estimating optimal dynamic treatment regimens, addressing convergence issues in existing methods and demonstrating improved performance through simulations and real data.
Contribution
It develops a penalized Q-shared algorithm that guarantees convergence and outperforms existing methods in estimating dynamic treatment rules.
Findings
The penalized algorithm converges where the original fails.
It outperforms the original Q-shared in simulations.
Effective in real-world medical data applications.
Abstract
A dynamic treatment regimen (DTR) is a set of decision rules to personalize treatments for an individual using their medical history. The Q-learning-based Q-shared algorithm has been used to develop DTRs that involve decision rules shared across multiple stages of intervention. We show that the existing Q-shared algorithm can suffer from non-convergence due to the use of linear models in the Q-learning setup, and identify the condition under which Q-shared fails. We develop a penalized Q-shared algorithm that not only converges in settings that violate the condition, but can outperform the original Q-shared algorithm even when the condition is satisfied. We give evidence for the proposed method in a real-world application and several synthetic simulations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods in Clinical Trials · Statistical Methods and Inference
MethodsQ-Learning
