On Multi-Armed Bandit Designs for Dose-Finding Clinical Trials
Maryam Aziz, Emilie Kaufmann, Marie-Karelle Riviere

TL;DR
This paper applies multi-armed bandit algorithms, specifically Thompson Sampling, to optimize dose-finding in early clinical trials, providing theoretical bounds and demonstrating superior performance over existing methods.
Contribution
It introduces a Thompson Sampling approach for dose-finding, offers finite-time bounds, and shows its effectiveness through extensive simulations.
Findings
Thompson Sampling outperforms existing dose-finding algorithms.
Finite-time bounds are established for the simplest Thompson Sampling variant.
Advanced priors improve dose identification accuracy.
Abstract
We study the problem of finding the optimal dosage in early stage clinical trials through the multi-armed bandit lens. We advocate the use of the Thompson Sampling principle, a flexible algorithm that can accommodate different types of monotonicity assumptions on the toxicity and efficacy of the doses. For the simplest version of Thompson Sampling, based on a uniform prior distribution for each dose, we provide finite-time upper bounds on the number of sub-optimal dose selections, which is unprecedented for dose-finding algorithms. Through a large simulation study, we then show that variants of Thompson Sampling based on more sophisticated prior distributions outperform state-of-the-art dose identification algorithms in different types of dose-finding studies that occur in phase I or phase I/II trials.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Statistical Methods in Clinical Trials · Machine Learning and Algorithms
