Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
Jingwu Tang, Gokul Swamy, Fei Fang, Zhiwei Steven Wu

TL;DR
This paper introduces the concept of regret gap in multi-agent imitation learning within Markov Games, highlighting its complexity compared to value gap minimization, and proposes methods to effectively minimize regret gap under certain assumptions.
Contribution
It defines the regret gap in MAIL, analyzes its relationship with value gap, and develops efficient reduction techniques to minimize regret gap in strategic multi-agent settings.
Findings
Value gap minimization is easier than regret gap minimization.
Achieving regret equivalence is more challenging than value equivalence.
Proposed reductions (MALICE and BLADES) effectively minimize regret gap under specific conditions.
Abstract
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents based on demonstrations of an expert doing so. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee robustness to deviations by strategic agents. Intuitively, this is because strategic deviations can depend on a counterfactual quantity: the coordinator's recommendations outside of the state distribution their recommendations induce. In response, we initiate the study of an alternative objective for MAIL in Markov Games we term the regret gap that explicitly accounts for potential deviations by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning
