Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations
Xingyu Wang, Diego Klabjan

TL;DR
This paper introduces a novel inverse reinforcement learning framework for zero-sum stochastic games that effectively learns rewards and strategies from sub-optimal expert demonstrations without assuming optimality.
Contribution
It proposes a new objective that directly compares experts to Nash Equilibrium strategies and develops algorithms for reward learning and equilibrium finding in large-scale, complex games.
Findings
Algorithms recover reward functions accurately from sub-optimal demonstrations.
Nash Equilibrium algorithms outperform previous methods on large-scale games.
Theoretical analysis shows no local optima in the proposed adversarial training objective.
Abstract
This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. In our setting the model and algorithm do not decouple by agent. In order to find Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In our numerical experiments, we demonstrate that our Nash Equilibrium and inverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research
