Risk-Averse Biased Human Policies in Assistive Multi-Armed Bandit Settings
Michael Koller, Timothy Patten, Markus Vincze

TL;DR
This paper introduces an algorithm for assistive multi-armed bandit scenarios that accounts for human risk-averse biases, enabling robots to make more rational decisions and improve team utility.
Contribution
It presents a novel algorithm that models human risk aversion within multi-armed bandit frameworks, enhancing human-robot collaboration.
Findings
Algorithm effectively handles arbitrary reward functions.
Robots can mitigate human biases to improve decision-making.
Potential for broader application in assistive systems.
Abstract
Assistive multi-armed bandit problems can be used to model team situations between a human and an autonomous system like a domestic service robot. To account for human biases such as the risk-aversion described in the Cumulative Prospect Theory, the setting is expanded to using observable rewards. When robots leverage knowledge about the risk-averse human model they eliminate the bias and make more rational choices. We present an algorithm that increases the utility value of such human-robot teams. A brief evaluation indicates that arbitrary reward functions can be handled.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics · Reinforcement Learning in Robotics
