On the interaction between supervision and self-play in emergent   communication

Ryan Lowe; Abhinav Gupta; Jakob Foerster; Douwe Kiela; Joelle Pineau

arXiv:2002.01093·cs.CL·June 24, 2020·30 cites

On the interaction between supervision and self-play in emergent communication

Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau

PDF

Open Access 1 Repo

TL;DR

This paper explores how combining supervised learning from human language data with self-play in multi-agent environments enhances the efficiency of emergent communication, showing that pretraining with supervised data before self-play yields better results.

Contribution

It introduces the concept of supervised self-play (S2P), demonstrating that pretraining with supervised data improves emergent communication, and proposes population-based S2P methods for further gains.

Findings

01

Pretraining with supervised learning before self-play outperforms the reverse approach.

02

Supervised self-play improves sample efficiency in emergent communication tasks.

03

Population-based S2P methods further enhance performance.

Abstract

A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training. However, recent work suggests that current machine learning methods are too data inefficient to be trained in this way from scratch. In this paper, we investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency: imitating human language data via supervised learning, and maximizing reward in a simulated multi-agent environment via self-play (as done in emergent communication), and introduce the term supervised self-play (S2P) for algorithms using both of these signals. We find that first training agents via supervised learning on human data followed by self-play outperforms the converse, suggesting that it is not beneficial to emerge languages from scratch. We then empirically investigate various S2P…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

backpropper/s2p
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Reinforcement Learning in Robotics · Topic Modeling