Risk-Averse Biased Human Policies in Assistive Multi-Armed Bandit   Settings

Michael Koller; Timothy Patten; Markus Vincze

arXiv:2104.05334·cs.RO·April 13, 2021

Risk-Averse Biased Human Policies in Assistive Multi-Armed Bandit Settings

Michael Koller, Timothy Patten, Markus Vincze

PDF

Open Access

TL;DR

This paper introduces an algorithm for assistive multi-armed bandit scenarios that accounts for human risk-averse biases, enabling robots to make more rational decisions and improve team utility.

Contribution

It presents a novel algorithm that models human risk aversion within multi-armed bandit frameworks, enhancing human-robot collaboration.

Findings

01

Algorithm effectively handles arbitrary reward functions.

02

Robots can mitigate human biases to improve decision-making.

03

Potential for broader application in assistive systems.

Abstract

Assistive multi-armed bandit problems can be used to model team situations between a human and an autonomous system like a domestic service robot. To account for human biases such as the risk-aversion described in the Cumulative Prospect Theory, the setting is expanded to using observable rewards. When robots leverage knowledge about the risk-averse human model they eliminate the bias and make more rational choices. We present an algorithm that increases the utility value of such human-robot teams. A brief evaluation indicates that arbitrary reward functions can be handled.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics · Reinforcement Learning in Robotics