Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability

Firas Mohamed Elamine Kiram; Imane Youkana; Rachida Saouli; Gian Antonio Susto; and Laid Kahloul

arXiv:2605.02552·cs.LG·May 5, 2026

Recurrent Deep Reinforcement Learning for Chemotherapy Control under Partial Observability

Firas Mohamed Elamine Kiram, Imane Youkana, Rachida Saouli, Gian Antonio Susto, and Laid Kahloul

PDF

TL;DR

This paper explores how recurrent deep reinforcement learning can improve chemotherapy control when patient state information is incomplete or noisy, demonstrating benefits over non-recurrent methods.

Contribution

It introduces a recurrent TD3-based approach for chemotherapy optimization and evaluates its effectiveness under partial observability conditions.

Findings

01

Recurrent policies outperform feed-forward ones under partial observability.

02

Memory-augmented policies show more consistent tumor suppression.

03

Recurrent methods provide stability and robustness in noisy clinical scenarios.

Abstract

Chemotherapy dose optimization can be formulated as a dynamic treatment regime, requiring sequential decisions under uncertainty that must balance tumor suppression against toxicity. However, most reinforcement learning approaches assume full observability of the patient state, a condition rarely met in clinical practice. We investigate whether memory-augmented policies can improve chemotherapy control under partial observability. To this end, we employ a recurrent TD3-based approach with separate LSTM actor-critic networks and evaluate it on the AhnChemoEnv benchmark from DTR-Bench, considering both off-policy and on-policy recurrent architectures against feed-forward TD3 and Soft Actor-Critic. Pharmacokinetic and pharmacodynamic variability are held fixed to isolate hidden-state uncertainty and observation noise and to avoid confounding effects from inter-patient variability. Across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.