Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces
Yanan Sui, Yisong Yue, Joel W. Burdick

TL;DR
This paper introduces CorrDuel, an algorithm for large-scale dueling bandits with correlated arms, applied to clinical treatment optimization, demonstrating improved regret bounds and successful real-world application in spinal cord injury therapy.
Contribution
The paper presents CorrDuel, a novel algorithm for large, correlated decision spaces, with theoretical regret bounds and practical validation in clinical treatment settings.
Findings
CorrDuel outperforms existing algorithms in large decision spaces.
The approach achieves low regret in simulations and clinical trial.
First application of online learning to spinal cord injury treatments.
Abstract
We consider sequential decision making under uncertainty, where the goal is to optimize over a large decision space using noisy comparative feedback. This problem can be formulated as a -armed Dueling Bandits problem where is the total number of decisions. When is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper studies the dueling bandits problem with a large number of arms that exhibit a low-dimensional correlation structure. Our problem is motivated by a clinical decision making process in large decision space. We propose an efficient algorithm CorrDuel which optimizes the exploration/exploitation tradeoff in this large decision space of clinical treatments. More broadly, our approach can be applied to other sequential decision problems with large and structured decision spaces. We derive regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
