LangMARL: Natural Language Multi-Agent Reinforcement Learning
Huaiyuan Yao, Longchao Da, Xiaoou Liu, Charles Fleming, Tianlong Chen, Hua Wei

TL;DR
LangMARL introduces a novel framework that applies multi-agent reinforcement learning principles to language models, enhancing coordination, sample efficiency, and interpretability in multi-agent tasks.
Contribution
It pioneers agent-level language credit assignment and gradient evolution in language space, addressing coordination and learning challenges in LLM-based multi-agent systems.
Findings
Improved sample efficiency across tasks
Enhanced interpretability of multi-agent interactions
Strong generalization capabilities demonstrated
Abstract
Large language model (LLM) agents struggle to autonomously evolve coordination strategies in dynamic environments, largely because coarse global outcomes obscure the causal signals needed for local policy refinement. We identify this bottleneck as a multi-agent credit assignment problem, which has long been studied in classical multi-agent reinforcement learning (MARL) but remains underaddressed in LLM-based systems. Building on this observation, we propose LangMARL, a framework that brings credit assignment and policy gradient evolution from cooperative MARL into the language space. LangMARL introduces agent-level language credit assignment, pioneers gradient evolution in language space for policy improvement, and summarizes task-relevant causal relations from replayed trajectories to provide dense feedback and improve convergence under sparse rewards. Extensive experiments across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
