Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning

Austin A. Nguyen; Michael P. Wellman

arXiv:2603.00374·cs.AI·March 3, 2026

Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning

Austin A. Nguyen, Michael P. Wellman

PDF

Open Access

TL;DR

This paper introduces COffeE-PSRO, a conservative offline multiagent reinforcement learning method that improves equilibrium discovery by accounting for data uncertainty and guiding strategy exploration.

Contribution

It extends PSRO with uncertainty quantification and a novel meta-strategy solver, enabling more reliable equilibrium approximation in offline multiagent settings.

Findings

01

COffeE-PSRO outperforms state-of-the-art offline methods in equilibrium quality.

02

The approach effectively quantifies game dynamics uncertainty.

03

Empirical results show improved strategy stability and lower regret.

Abstract

Offline learning of strategies takes data efficiency to its extreme by restricting algorithms to a fixed dataset of state-action trajectories. We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game under the offline learning constraint. We first frame this problem in terms of selecting among candidate equilibria. Since datasets may inform only a small fraction of game dynamics, it is generally infeasible in offline game-solving to even verify a proposed solution is a true equilibrium. Therefore, we consider the relative probability of low regret (i.e., closeness to equilibrium) across candidates based on the information available. Specifically, we extend Policy Space Response Oracles (PSRO), an online game-solving approach, by quantifying game dynamics uncertainty and modifying the RL objective to skew towards solutions more likely to have low…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Artificial Intelligence in Games