Policy-Conditioned Policies for Multi-Agent Task Solving

Yue Lin; Shuhui Zhu; Wenhao Li; Ang Li; Dan Qiao; Pascal Poupart; Hongyuan Zha; Baoxiang Wang

arXiv:2512.21024·cs.GT·December 25, 2025

Policy-Conditioned Policies for Multi-Agent Task Solving

Yue Lin, Shuhui Zhu, Wenhao Li, Ang Li, Dan Qiao, Pascal Poupart, Hongyuan Zha, Baoxiang Wang

PDF

Open Access

TL;DR

This paper introduces a novel approach for multi-agent reinforcement learning by representing policies as human-readable code and using LLMs to optimize and respond to opponents, enabling better strategic adaptation.

Contribution

It proposes a new paradigm that uses programmatic policy representations and LLMs as interpreters to improve multi-agent strategy adaptation and learning.

Findings

01

Successfully solves coordination matrix games

02

Effective in cooperative Level-Based Foraging environment

03

Demonstrates the viability of programmatic policies with LLM optimization

Abstract

In multi-agent tasks, the central challenge lies in the dynamic adaptation of strategies. However, directly conditioning on opponents' strategies is intractable in the prevalent deep reinforcement learning paradigm due to a fundamental ``representational bottleneck'': neural policies are opaque, high-dimensional parameter vectors that are incomprehensible to other agents. In this work, we propose a paradigm shift that bridges this gap by representing policies as human-interpretable source code and utilizing Large Language Models (LLMs) as approximate interpreters. This programmatic representation allows us to operationalize the game-theoretic concept of \textit{Program Equilibrium}. We reformulate the learning problem by utilizing LLMs to perform optimization directly in the space of programmatic policies. The LLM functions as a point-wise best-response operator that iteratively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Explainable Artificial Intelligence (XAI)