Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability
Firas Mohamed Elamine Kiram, Imane Youkana, Rachida Saouli, Gian Antonio Susto, and Laid Kahloul

TL;DR
This paper explores how recurrent deep reinforcement learning can improve chemotherapy control when patient state information is incomplete or noisy, demonstrating benefits over non-recurrent methods.
Contribution
It introduces a recurrent TD3-based approach for chemotherapy optimization and evaluates its effectiveness under partial observability conditions.
Findings
Recurrent policies outperform feed-forward ones under partial observability.
Memory-augmented policies show more consistent tumor suppression.
Recurrent methods provide stability and robustness in noisy clinical scenarios.
Abstract
Chemotherapy dose optimization can be formulated as a dynamic treatment regime, requiring sequential decisions under uncertainty that must balance tumor suppression against toxicity. However, most reinforcement learning approaches assume full observability of the patient state, a condition rarely met in clinical practice. We investigate whether memory-augmented policies can improve chemotherapy control under partial observability. To this end, we employ a recurrent TD3-based approach with separate LSTM actor-critic networks and evaluate it on the AhnChemoEnv benchmark from DTR-Bench, considering both off-policy and on-policy recurrent architectures against feed-forward TD3 and Soft Actor-Critic. Pharmacokinetic and pharmacodynamic variability are held fixed to isolate hidden-state uncertainty and observation noise and to avoid confounding effects from inter-patient variability. Across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
