Modeling Human Decision-making in Generalized Gaussian Multi-armed   Bandits

Paul Reverdy; Vaibhav Srivastava; Naomi E. Leonard

arXiv:1307.6134·cs.LG·December 23, 2019·5 cites

Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits

Paul Reverdy, Vaibhav Srivastava, Naomi E. Leonard

PDF

Open Access

TL;DR

This paper introduces a Bayesian-based UCL algorithm for generalized Gaussian multi-armed bandit problems, demonstrating its optimal regret bounds and alignment with human decision-making behavior through empirical validation.

Contribution

It develops the UCL algorithm for various bandit settings, extending its applicability and showing its effectiveness in modeling human decision-making with empirical support.

Findings

01

UCL algorithm achieves logarithmic regret in standard bandits

02

Human behavior is well modeled by the stochastic UCL algorithm

03

Extensions to transition costs and graph structures maintain optimal regret

Abstract

We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values. We model the decision-maker's prior knowledge with the Bayesian prior on the mean reward. We develop the upper credible limit (UCL) algorithm for the standard multi-armed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference