Penalized Q-Learning for Dynamic Treatment Regimes
Rui Song, Weiwei Wang, Donglin Zeng, Michael R. Kosorok

TL;DR
This paper introduces penalized Q-learning, a reinforcement learning framework designed to improve statistical inference and individual treatment selection in dynamic treatment regimes, especially under non-regular scenarios in clinical trial data.
Contribution
The paper develops a novel penalized Q-learning method with individual selection, addressing non-regularities and enhancing inference in dynamic treatment regimes.
Findings
Proposed method outperforms existing approaches in simulations.
Demonstrated effectiveness on depression clinical trial data.
Provides valid statistical inference under complex scenarios.
Abstract
A dynamic treatment regime effectively incorporates both accrued information and long-term effects of treatment from specially designed clinical trials. As these become more and more popular in conjunction with longitudinal data from clinical studies, the development of statistical inference for optimal dynamic treatment regimes is a high priority. This is very challenging due to the difficulties arising form non-regularities in the treatment effect parameters. In this paper, we propose a new reinforcement learning framework called penalized Q-learning (PQ-learning), under which the non-regularities can be resolved and valid statistical inference established. We also propose a new statistical procedure---individual selection---and corresponding methods for incorporating individual selection within PQ-learning. Extensive numerical studies are presented which compare the proposed methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Statistical Methods in Clinical Trials · Gene Regulatory Network Analysis
