Risk-inclusive Contextual Bandits for Early Phase Clinical Trials
Rohit Kanrar, Chunlin Li, Zara Ghodsi, Margaret Gamalo

TL;DR
This paper presents a novel risk-inclusive contextual bandit algorithm for early-phase clinical trials that optimizes drug dosing by balancing safety and efficacy using participant-specific data and advanced statistical methods.
Contribution
It introduces a new algorithm combining dual Thompson samplers and generalized confidence sequences for improved dose allocation in clinical trials.
Findings
Outperforms traditional randomized dose allocation methods.
Provides uniform coverage guarantees for sequential causal inference.
Aligns well with real data from a Phase IIb study.
Abstract
Early-phase clinical trials face the challenge of selecting optimal drug doses that balance safety and efficacy due to uncertain dose-response relationships and varied participant characteristics. Traditional randomized dose allocation often exposes participants to sub-optimal doses by not considering individual covariates, necessitating larger sample sizes and prolonging drug development. This paper introduces a risk-inclusive contextual bandit algorithm that utilizes multi-arm bandit (MAB) strategies to optimize dosing through participant-specific data integration. By combining two separate Thompson samplers, one for efficacy and one for safety, the algorithm enhances the balance between efficacy and safety in dose allocation. The effect sizes are estimated with a generalized version of asymptotic confidence sequences (AsympCS), offering a uniform coverage guarantee for sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
