A Regularized Opponent Model with Maximum Entropy Objective

Zheng Tian; Ying Wen; Zhichen Gong; Faiz Punakkath; Shihao Zou; Jun; Wang

arXiv:1905.08087·cs.MA·August 20, 2019·6 cites

A Regularized Opponent Model with Maximum Entropy Objective

Zheng Tian, Ying Wen, Zhichen Gong, Faiz Punakkath, Shihao Zou, Jun, Wang

PDF

Open Access 1 Repo

TL;DR

This paper reformulates multi-agent reinforcement learning as probabilistic inference using a maximum entropy objective, introducing ROMMEO, which improves agent training performance through novel opponent modeling techniques.

Contribution

It redefines the optimality variable in multi-agent settings, derives a variational lower bound, and proposes new algorithms ROMMEO-Q and ROMMEO-AC with proven convergence and empirical success.

Findings

01

ROMMEO outperforms strong MARL baselines in iterated matrix and differential games.

02

The algorithms demonstrate convergence and improved training efficiency.

03

The approach offers a new probabilistic perspective on opponent modeling in MARL.

Abstract

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality". In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rommeoijcai2019/rommeo
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning