HR-Bandit: Human-AI Collaborated Linear Recourse Bandit
Junyu Cao, Ruijiang Gao, Esmaeil Keyvanshokooh

TL;DR
The paper introduces HR-Bandit, a human-AI collaborative algorithm for linear bandit problems that optimizes decision-making with human input, ensuring efficiency, robustness, and improved initial performance, validated through healthcare case studies.
Contribution
It proposes the HR-Bandit algorithm, integrating human expertise into linear bandits with guarantees on warm-start, human effort, and robustness, advancing collaborative decision-making methods.
Findings
Outperforms existing benchmarks in empirical tests
Provides theoretical guarantees on performance and effort
Validated through a healthcare case study
Abstract
Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB () algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit (), which integrates human expertise to enhance performance. offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI and HR Technologies · Advanced Bandit Algorithms Research · AI in Service Interactions
