Maximum Entropy Population-Based Training for Zero-Shot Human-AI   Coordination

Rui Zhao; Jinming Song; Yufeng Yuan; Hu Haifeng; Yang Gao; Yi Wu,; Zhongqian Sun; Yang Wei

arXiv:2112.11701·cs.AI·June 28, 2022·5 cites

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

Rui Zhao, Jinming Song, Yufeng Yuan, Hu Haifeng, Yang Gao, Yi Wu,, Zhongqian Sun, Yang Wei

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces Maximum Entropy Population-based training (MEP), a method to train RL agents that collaborate effectively with humans without human data, by promoting diversity and mitigating distributional shift.

Contribution

The paper proposes MEP, a novel training approach that enhances human-AI collaboration by maintaining diversity in agent populations and dynamically prioritizing training partners.

Findings

01

MEP outperforms existing methods like SP, PBT, TrajeDi, and FCP in Overcooked.

02

Agents trained with MEP show improved robustness with human partners.

03

Diversity promotion reduces distributional shift in human-AI collaboration.

Abstract

We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using any human data. Although such agents can be obtained through self-play training, they can suffer significantly from distributional shift when paired with unencountered partners, such as humans. To mitigate this distributional shift, we propose Maximum Entropy Population-based training (MEP). In MEP, agents in the population are trained with our derived Population Entropy bonus to promote both pairwise diversity between agents and individual diversity of agents themselves, and a common best agent is trained by paring with agents in this diversified population via prioritized sampling. The prioritization is dynamically adjusted based on the training progress. We demonstrate the effectiveness of our method MEP, with comparison to Self-Play PPO (SP), Population-Based Training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination· underline

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsEntropy Regularization · Proximal Policy Optimization