Policy-Conditioned Policies for Multi-Agent Task Solving
Yue Lin, Shuhui Zhu, Wenhao Li, Ang Li, Dan Qiao, Pascal Poupart, Hongyuan Zha, Baoxiang Wang

TL;DR
This paper introduces a novel approach for multi-agent reinforcement learning by representing policies as human-readable code and using LLMs to optimize and respond to opponents, enabling better strategic adaptation.
Contribution
It proposes a new paradigm that uses programmatic policy representations and LLMs as interpreters to improve multi-agent strategy adaptation and learning.
Findings
Successfully solves coordination matrix games
Effective in cooperative Level-Based Foraging environment
Demonstrates the viability of programmatic policies with LLM optimization
Abstract
In multi-agent tasks, the central challenge lies in the dynamic adaptation of strategies. However, directly conditioning on opponents' strategies is intractable in the prevalent deep reinforcement learning paradigm due to a fundamental ``representational bottleneck'': neural policies are opaque, high-dimensional parameter vectors that are incomprehensible to other agents. In this work, we propose a paradigm shift that bridges this gap by representing policies as human-interpretable source code and utilizing Large Language Models (LLMs) as approximate interpreters. This programmatic representation allows us to operationalize the game-theoretic concept of \textit{Program Equilibrium}. We reformulate the learning problem by utilizing LLMs to perform optimization directly in the space of programmatic policies. The LLM functions as a point-wise best-response operator that iteratively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Explainable Artificial Intelligence (XAI)
