Learning to Assist Agents by Observing Them

Antti Keurulainen (1; 3); Isak Westerlund (3); Samuel Kaski (1 and; 2); and Alexander Ilin (1) ((1) Helsinki Institute for Information Technology; HIIT; Department of Computer Science; Aalto University; (2) Department of; Computer Science; University of Manchester; (3) Bitville Oy; Espoo; Finland)

arXiv:2110.01311·cs.AI·October 5, 2021

Learning to Assist Agents by Observing Them

Antti Keurulainen (1, 3), Isak Westerlund (3), Samuel Kaski (1 and, 2), and Alexander Ilin (1) ((1) Helsinki Institute for Information Technology, HIIT, Department of Computer Science, Aalto University, (2) Department of, Computer Science, University of Manchester

PDF

Open Access

TL;DR

This paper proposes a method for training AI agents to assist other agents by pre-training behavior representations with offline data, reducing the need for extensive online training, and demonstrating its effectiveness in gridworld scenarios.

Contribution

It introduces a two-stage training approach combining offline behavior representation pre-training with minimal online interaction for assistance policy learning.

Findings

01

Assistance significantly improves agent performance in gridworld scenarios.

02

Pre-training reduces online training requirements.

03

Method effectively leverages offline data for assistance learning.

Abstract

The ability of an AI agent to assist other agents, such as humans, is an important and challenging goal, which requires the assisting agent to reason about the behavior and infer the goals of the assisted agent. Training such an ability by using reinforcement learning usually requires large amounts of online training, which is difficult and costly. On the other hand, offline data about the behavior of the assisted agent might be available, but is non-trivial to take advantage of by methods such as offline reinforcement learning. We introduce methods where the capability to create a representation of the behavior is first pre-trained with offline data, after which only a small amount of interaction data is needed to learn an assisting policy. We test the setting in a gridworld where the helper agent has the capability to manipulate the environment of the assisted artificial agents, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research

MethodsTest