Behaviour-conditioned policies for cooperative reinforcement learning   tasks

Antti Keurulainen (1; 3); Isak Westerlund (3); Ariel Kwiatkowski; (3); Samuel Kaski (1; 2); Alexander Ilin (1) ((1) Helsinki Institute; for Information Technology HIIT; Department of Computer Science; Aalto; University; (2) Department of Computer Science; University of Manchester; (3); Bitville Oy; Espoo; Finland)

arXiv:2110.01266·cs.LG·October 5, 2021

Behaviour-conditioned policies for cooperative reinforcement learning tasks

Antti Keurulainen (1, 3), Isak Westerlund (3), Ariel Kwiatkowski, (3), Samuel Kaski (1, 2), Alexander Ilin (1) ((1) Helsinki Institute, for Information Technology HIIT, Department of Computer Science, Aalto, University, (2) Department of Computer Science

PDF

Open Access

TL;DR

This paper introduces a meta-learning approach for cooperative reinforcement learning, enabling agents to quickly adapt to unknown partner behaviors by training on synthetic behavioral data, improving cooperation efficiency.

Contribution

It proposes a novel meta-learner architecture trained on synthetic agent behaviors, allowing rapid adaptation in cooperative tasks with unknown partners.

Findings

01

Meta-learner enables quick adaptation to new partner behaviors.

02

Synthetic behavioral data improves training efficiency.

03

Method supports automatic task distribution formation.

Abstract

The cooperation among AI systems, and between AI systems and humans is becoming increasingly important. In various real-world tasks, an agent needs to cooperate with unknown partner agent types. This requires the agent to assess the behaviour of the partner agent during a cooperative task and to adjust its own policy to support the cooperation. Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning. However, adapting to a partner agent behaviour during the ongoing task requires ability to assess the partner agent type quickly. We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour, and use this data for training a meta-learner. We additionally suggest an agent architecture, which can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Machine Learning and Data Classification