Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Till Freihaut; Luca Viano; Volkan Cevher; Matthieu Geist; Giorgia Ramponi

arXiv:2505.17610·cs.LG·October 10, 2025

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

Till Freihaut, Luca Viano, Volkan Cevher, Matthieu Geist, Giorgia Ramponi

PDF

1 Video

TL;DR

This paper introduces a theoretical framework for learning Nash equilibria in Markov Games from expert data, highlighting the importance of the single policy deviation concentrability coefficient and proposing two algorithms with provable guarantees.

Contribution

It provides the first sample complexity bounds for equilibrium learning in Markov Games and introduces two novel algorithms, MAIL-BRO and MURMAIL, with theoretical performance guarantees.

Findings

01

Behavioral cloning has high regret with large concentrability coefficients.

02

MAIL-BRO learns approximate Nash equilibria with polynomial query complexity.

03

MURMAIL avoids the best response oracle but requires more expert queries.

Abstract

This paper provides the first expert sample complexity characterization for learning a Nash equilibrium from expert data in Markov Games. We show that a new quantity named the single policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting, and we provide an upper bound for behavioral cloning (BC) featuring such coefficient. BC exhibits substantial regret in games with high concentrability coefficient, leading us to utilize expert queries to develop and introduce two novel solution algorithms: MAIL-BRO and MURMAIL. The former employs a best response oracle and learns an $ε$ -Nash equilibrium with $O (ε^{- 4})$ expert and oracle queries. The latter bypasses completely the best response oracle at the cost of a worse expert query complexity of order $O (ε^{- 8})$ . Finally, we provide numerical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning· slideslive