Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Zun Li; Marc Lanctot; Kevin R. McKee; Luke Marris; Ian Gemp; Daniel Hennes; Paul Muller; Kate Larson; Yoram Bachrach; Michael P. Wellman

arXiv:2302.00797·cs.AI·April 7, 2026

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. Wellman

PDF

TL;DR

This paper introduces a scalable deep reinforcement learning framework combining tree search, generative models, and bargaining theory to improve opponent modeling and negotiation in multiagent systems.

Contribution

It proposes Generative Best Response (GenBR), a scalable MCTS-based algorithm with deep generative models, integrated into PSRO for improved opponent modeling in large imperfect information domains.

Findings

01

GenBR scales to large imperfect information domains.

02

Agents achieve comparable social welfare and bargaining scores with humans.

03

Search with generative modeling enhances policy strength and online prediction.

Abstract

Opponent modeling methods typically involve two crucial steps: building a belief distribution over opponents' strategies, and exploiting this opponent model by playing a best response. However, existing approaches typically require domain-specific heurstics to come up with such a model, and algorithms for approximating best responses are hard to scale in large, imperfect information domains. In this work, we introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning. We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS) with a learned deep generative model that samples world states during planning. This new method scales to large imperfect information domains and can be plug and play in a variety of multiagent algorithms. We use this new method under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.