MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning

Chanwoo Park; Seungju Han; Xingzhi Guo; Asuman Ozdaglar; Kaiqing Zhang; Joo-Kyung Kim

arXiv:2502.18439·cs.AI·July 15, 2025

MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning

Chanwoo Park, Seungju Han, Xingzhi Guo, Asuman Ozdaglar, Kaiqing Zhang, Joo-Kyung Kim

PDF

Open Access

TL;DR

This paper introduces MAPoRL, a multi-agent post-training framework using reinforcement learning to explicitly foster collaboration among large language models, improving their collective performance and generalization across domains.

Contribution

The paper proposes a novel multi-agent post-co-training paradigm with reinforcement learning to enhance collaborative behaviors in large language models, surpassing existing prompting methods.

Findings

01

Multi-agent co-training improves collaboration performance.

02

MAPoRL generalizes well to unseen domains.

03

Training individual LLMs alone is insufficient for effective collaboration.

Abstract

Leveraging multiple large language models (LLMs) to build collaborative multi-agentic workflows has demonstrated significant potential. However, most previous studies focus on prompting the out-of-the-box LLMs, relying on their innate capability for collaboration, which may not improve LLMs' performance as shown recently. In this paper, we introduce a new post-training paradigm MAPoRL (Multi-Agent Post-co-training for collaborative LLMs with Reinforcement Learning), to explicitly elicit the collaborative behaviors and further unleash the power of multi-agentic LLM frameworks. In MAPoRL, multiple LLMs first generate their own responses independently and engage in a multi-turn discussion to collaboratively improve the final answer. In the end, a MAPoRL verifier evaluates both the answer and the discussion, by assigning a score that verifies the correctness of the answer, while adding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsFocus