Learning Existing Social Conventions via Observationally Augmented   Self-Play

Adam Lerer; Alexander Peysakhovich

arXiv:1806.10071·cs.AI·March 14, 2019

Learning Existing Social Conventions via Observationally Augmented Self-Play

Adam Lerer, Alexander Peysakhovich

PDF

TL;DR

This paper proposes a method to help artificial agents learn social conventions by combining multi-agent reinforcement learning with imitation learning, enabling better coordination with existing groups in various environments.

Contribution

It introduces an augmentation technique for MARL using imitation learning to improve agents' alignment with real-world social conventions.

Findings

01

Augmentation with imitation learning increases convention alignment.

02

Method works in traffic, communication, and team coordination environments.

03

Enhances MARL effectiveness even when standard methods fail to find true conventions.

Abstract

In order for artificial agents to coordinate effectively with people, they must act consistently with existing conventions (e.g. how to navigate in traffic, which language to speak, or how to coordinate with teammates). A group's conventions can be viewed as a choice of equilibrium in a coordination game. We consider the problem of an agent learning a policy for a coordination game in a simulated environment and then using this policy when it enters an existing group. When there are multiple possible conventions we show that learning a policy via multi-agent reinforcement learning (MARL) is likely to find policies which achieve high payoffs at training time but fail to coordinate with the real group into which the agent enters. We assume access to a small number of samples of behavior from the true convention and show that we can augment the MARL objective to help it find policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.