Discovering Multiagent Learning Algorithms with Large Language Models

Zun Li; John Schultz; Daniel Hennes; Marc Lanctot

arXiv:2602.16928·cs.GT·May 11, 2026

Discovering Multiagent Learning Algorithms with Large Language Models

Zun Li, John Schultz, Daniel Hennes, Marc Lanctot

PDF

TL;DR

This paper uses large language models to automate the discovery of multi-agent learning algorithms, resulting in novel, competitive methods that generalize well with simplified structures.

Contribution

It introduces AlphaEvolve, an LLM-powered framework that discovers new algorithms and distills them into minimal, generalizable solvers for multi-agent reinforcement learning.

Findings

01

Discovered two new algorithms: VAD-CFR and SHOR-PSRO.

02

Distilled minimal algorithms outperform complex counterparts in generalization.

03

Automated discovery yields competitive algorithms across 18 game environments.

Abstract

Much of the advancement in Multi-Agent Reinforcement Learning (MARL) for imperfect-information games has historically depended on the manual, iterative refinement of algorithmic baselines. Recently, evolutionary coding agents powered by Large Language Models (LLMs) have emerged as powerful tools to automate this discovery process. In this work, we deploy one of such agentic frameworks, AlphaEvolve, to navigate the design spaces of two distinct game-theoretic paradigms: counterfactual regret minimization (CFR) and policy-space response oracles (PSRO). This automated search yielded two algorithms: Volatility-Adaptive Discounted (VAD-) CFR and Smoothed Hybrid Optimistic Regret (SHOR-) PSRO, which are consistently competitive with state-of-the-art human-designed baselines across an 18-game evaluation suite spanning Poker, Goofspiel, Liar's Dice, Blotto, and Battleship variants. However,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.