# The Assistive Multi-Armed Bandit

**Authors:** Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, Anca, Dragan

arXiv: 1901.08654 · 2019-01-28

## TL;DR

This paper introduces the assistive multi-armed bandit framework where a robot helps a human learn and maximize rewards in a bandit task, considering the human's learning process and communication with the robot.

## Contribution

It provides necessary and sufficient conditions for effective human assistance in bandit tasks and highlights the importance of communication over isolated human performance.

## Key findings

- Better human performance alone does not guarantee better assistance outcomes.
- Effective communication of observed rewards enhances human-robot collaboration.
- Proof-of-concept experiments support the theoretical results.

## Abstract

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multi-armed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward. In this problem, the human does not know the reward function but can learn it through the rewards received from arm pulls; the robot only observes which arms the human pulls but not the reward associated with each pull. We offer sufficient and necessary conditions for successfully assisting the human in this framework. Surprisingly, better human performance in isolation does not necessarily lead to better performance when assisted by the robot: a human policy can do better by effectively communicating its observed rewards to the robot. We conduct proof-of-concept experiments that support these results. We see this work as contributing towards a theory behind algorithms for human-robot interaction.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.08654/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1901.08654/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/1901.08654/full.md

---
Source: https://tomesphere.com/paper/1901.08654