Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal   Demonstrations

Xingyu Wang; Diego Klabjan

arXiv:1801.02124·stat.ML·June 7, 2018·6 cites

Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

Xingyu Wang, Diego Klabjan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel inverse reinforcement learning framework for zero-sum stochastic games that effectively learns rewards and strategies from sub-optimal expert demonstrations without assuming optimality.

Contribution

It proposes a new objective that directly compares experts to Nash Equilibrium strategies and develops algorithms for reward learning and equilibrium finding in large-scale, complex games.

Findings

01

Algorithms recover reward functions accurately from sub-optimal demonstrations.

02

Nash Equilibrium algorithms outperform previous methods on large-scale games.

03

Theoretical analysis shows no local optima in the proposed adversarial training objective.

Abstract

This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. In our setting the model and algorithm do not decouple by agent. In order to find Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In our numerical experiments, we demonstrate that our Nash Equilibrium and inverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mdabbah/pacman-rl-project
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research