Discovering Multiagent Learning Algorithms with Large Language Models
Zun Li, John Schultz, Daniel Hennes, Marc Lanctot

TL;DR
This paper uses large language models to automate the discovery of multi-agent learning algorithms, resulting in novel, competitive methods that generalize well with simplified structures.
Contribution
It introduces AlphaEvolve, an LLM-powered framework that discovers new algorithms and distills them into minimal, generalizable solvers for multi-agent reinforcement learning.
Findings
Discovered two new algorithms: VAD-CFR and SHOR-PSRO.
Distilled minimal algorithms outperform complex counterparts in generalization.
Automated discovery yields competitive algorithms across 18 game environments.
Abstract
Much of the advancement in Multi-Agent Reinforcement Learning (MARL) for imperfect-information games has historically depended on the manual, iterative refinement of algorithmic baselines. Recently, evolutionary coding agents powered by Large Language Models (LLMs) have emerged as powerful tools to automate this discovery process. In this work, we deploy one of such agentic frameworks, AlphaEvolve, to navigate the design spaces of two distinct game-theoretic paradigms: counterfactual regret minimization (CFR) and policy-space response oracles (PSRO). This automated search yielded two algorithms: Volatility-Adaptive Discounted (VAD-) CFR and Smoothed Hybrid Optimistic Regret (SHOR-) PSRO, which are consistently competitive with state-of-the-art human-designed baselines across an 18-game evaluation suite spanning Poker, Goofspiel, Liar's Dice, Blotto, and Battleship variants. However,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
