Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits
Paul Reverdy, Vaibhav Srivastava, Naomi E. Leonard

TL;DR
This paper introduces a Bayesian-based UCL algorithm for generalized Gaussian multi-armed bandit problems, demonstrating its optimal regret bounds and alignment with human decision-making behavior through empirical validation.
Contribution
It develops the UCL algorithm for various bandit settings, extending its applicability and showing its effectiveness in modeling human decision-making with empirical support.
Findings
UCL algorithm achieves logarithmic regret in standard bandits
Human behavior is well modeled by the stochastic UCL algorithm
Extensions to transition costs and graph structures maintain optimal regret
Abstract
We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values. We model the decision-maker's prior knowledge with the Bayesian prior on the mean reward. We develop the upper credible limit (UCL) algorithm for the standard multi-armed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference
